Modified: tomcat/jk/trunk/native/iis/pcre/doc/pcre.txt
URL: 
http://svn.apache.org/viewvc/tomcat/jk/trunk/native/iis/pcre/doc/pcre.txt?rev=1725105&r1=1725104&r2=1725105&view=diff
==============================================================================
--- tomcat/jk/trunk/native/iis/pcre/doc/pcre.txt (original)
+++ tomcat/jk/trunk/native/iis/pcre/doc/pcre.txt Sun Jan 17 17:23:28 2016
@@ -5000,7 +5000,8 @@ BACKSLASH
        appearance  of non-printing characters, apart from the binary zero that
        terminates a pattern, but when a pattern  is  being  prepared  by  text
        editing,  it  is  often  easier  to  use  one  of  the following escape
-       sequences than the binary character it represents:
+       sequences than the binary character it represents.  In an ASCII or Uni-
+       code environment, these escapes are as follows:
 
          \a        alarm, that is, the BEL character (hex 07)
          \cx       "control-x", where x is any ASCII character
@@ -5016,55 +5017,67 @@ BACKSLASH
          \x{hhh..} character with hex code hhh.. (non-JavaScript mode)
          \uhhhh    character with hex code hhhh (JavaScript mode only)
 
-       The precise effect of \cx on ASCII characters is as follows: if x is  a
-       lower  case  letter,  it  is converted to upper case. Then bit 6 of the
+       The  precise effect of \cx on ASCII characters is as follows: if x is a
+       lower case letter, it is converted to upper case. Then  bit  6  of  the
        character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A
-       (A  is  41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and \c; becomes
-       hex 7B (; is 3B). If the data item (byte or 16-bit value) following  \c
-       has  a  value greater than 127, a compile-time error occurs. This locks
+       (A is 41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and  \c;  becomes
+       hex  7B (; is 3B). If the data item (byte or 16-bit value) following \c
+       has a value greater than 127, a compile-time error occurs.  This  locks
        out non-ASCII characters in all modes.
 
-       The \c facility was designed for use with ASCII  characters,  but  with
-       the  extension  to  Unicode it is even less useful than it once was. It
-       is, however, recognized when PCRE is compiled  in  EBCDIC  mode,  where
-       data  items  are always bytes. In this mode, all values are valid after
-       \c. If the next character is a lower case letter, it  is  converted  to
-       upper  case.  Then  the  0xc0  bits  of the byte are inverted. Thus \cA
-       becomes hex 01, as in ASCII (A is C1), but because the  EBCDIC  letters
-       are  disjoint,  \cZ becomes hex 29 (Z is E9), and other characters also
-       generate different values.
-
-       After \0 up to two further octal digits are read. If  there  are  fewer
-       than  two  digits,  just  those  that  are  present  are used. Thus the
-       sequence \0\x\07 specifies two binary zeros followed by a BEL character
-       (code  value 7). Make sure you supply two digits after the initial zero
+       When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t gener-
+       ate the appropriate EBCDIC code values. The \c escape is  processed  as
+       specified for Perl in the perlebcdic document. The only characters that
+       are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^,  _,  or  ?.
+       Any  other  character  provokes  a  compile-time error. The sequence \@
+       encodes character code 0; the letters (in either case)  encode  charac-
+       ters 1-26 (hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31
+       (hex 1B to hex 1F), and \? becomes either 255 (hex FF) or 95 (hex 5F).
+
+       Thus, apart from \?, these escapes generate  the  same  character  code
+       values  as  they do in an ASCII environment, though the meanings of the
+       values mostly differ. For example, \G always generates  code  value  7,
+       which is BEL in ASCII but DEL in EBCDIC.
+
+       The  sequence  \?  generates DEL (127, hex 7F) in an ASCII environment,
+       but because 127 is not a control character in  EBCDIC,  Perl  makes  it
+       generate  the  APC character. Unfortunately, there are several variants
+       of EBCDIC. In most of them the APC character has  the  value  255  (hex
+       FF),  but  in  the one Perl calls POSIX-BC its value is 95 (hex 5F). If
+       certain other characters have POSIX-BC values, PCRE makes  \?  generate
+       95; otherwise it generates 255.
+
+       After  \0  up  to two further octal digits are read. If there are fewer
+       than two digits, just  those  that  are  present  are  used.  Thus  the
+       sequence \0\x\015 specifies two binary zeros followed by a CR character
+       (code value 13). Make sure you supply two digits after the initial zero
        if the pattern character that follows is itself an octal digit.
 
-       The escape \o must be followed by a sequence of octal digits,  enclosed
-       in  braces.  An  error occurs if this is not the case. This escape is a
-       recent addition to Perl; it provides way of specifying  character  code
-       points  as  octal  numbers  greater than 0777, and it also allows octal
+       The  escape \o must be followed by a sequence of octal digits, enclosed
+       in braces. An error occurs if this is not the case. This  escape  is  a
+       recent  addition  to Perl; it provides way of specifying character code
+       points as octal numbers greater than 0777, and  it  also  allows  octal
        numbers and back references to be unambiguously specified.
 
        For greater clarity and unambiguity, it is best to avoid following \ by
        a digit greater than zero. Instead, use \o{} or \x{} to specify charac-
-       ter numbers, and \g{} to specify back references. The  following  para-
+       ter  numbers,  and \g{} to specify back references. The following para-
        graphs describe the old, ambiguous syntax.
 
        The handling of a backslash followed by a digit other than 0 is compli-
-       cated, and Perl has changed in recent releases, causing  PCRE  also  to
+       cated,  and  Perl  has changed in recent releases, causing PCRE also to
        change. Outside a character class, PCRE reads the digit and any follow-
-       ing digits as a decimal number. If the number is less  than  8,  or  if
-       there  have been at least that many previous capturing left parentheses
-       in the expression, the entire sequence is taken as a back reference.  A
-       description  of how this works is given later, following the discussion
+       ing  digits  as  a  decimal number. If the number is less than 8, or if
+       there have been at least that many previous capturing left  parentheses
+       in  the expression, the entire sequence is taken as a back reference. A
+       description of how this works is given later, following the  discussion
        of parenthesized subpatterns.
 
-       Inside a character class, or if  the  decimal  number  following  \  is
+       Inside  a  character  class,  or  if  the decimal number following \ is
        greater than 7 and there have not been that many capturing subpatterns,
-       PCRE handles \8 and \9 as the literal characters "8" and "9", and  oth-
+       PCRE  handles \8 and \9 as the literal characters "8" and "9", and oth-
        erwise re-reads up to three octal digits following the backslash, using
-       them to generate a data character.  Any  subsequent  digits  stand  for
+       them  to  generate  a  data character.  Any subsequent digits stand for
        themselves. For example:
 
          \040   is another way of writing an ASCII space
@@ -5082,31 +5095,31 @@ BACKSLASH
          \81    is either a back reference, or the two
                    characters "8" and "1"
 
-       Note  that octal values of 100 or greater that are specified using this
-       syntax must not be introduced by a leading zero, because no  more  than
+       Note that octal values of 100 or greater that are specified using  this
+       syntax  must  not be introduced by a leading zero, because no more than
        three octal digits are ever read.
 
-       By  default, after \x that is not followed by {, from zero to two hexa-
-       decimal digits are read (letters can be in upper or  lower  case).  Any
+       By default, after \x that is not followed by {, from zero to two  hexa-
+       decimal  digits  are  read (letters can be in upper or lower case). Any
        number of hexadecimal digits may appear between \x{ and }. If a charac-
-       ter other than a hexadecimal digit appears between \x{  and  },  or  if
+       ter  other  than  a  hexadecimal digit appears between \x{ and }, or if
        there is no terminating }, an error occurs.
 
-       If  the  PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x
-       is as just described only when it is followed by two  hexadecimal  dig-
-       its.   Otherwise,  it  matches  a  literal "x" character. In JavaScript
+       If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation  of  \x
+       is  as  just described only when it is followed by two hexadecimal dig-
+       its.  Otherwise, it matches a  literal  "x"  character.  In  JavaScript
        mode, support for code points greater than 256 is provided by \u, which
-       must  be  followed  by  four hexadecimal digits; otherwise it matches a
+       must be followed by four hexadecimal digits;  otherwise  it  matches  a
        literal "u" character.
 
        Characters whose value is less than 256 can be defined by either of the
-       two  syntaxes for \x (or by \u in JavaScript mode). There is no differ-
+       two syntaxes for \x (or by \u in JavaScript mode). There is no  differ-
        ence in the way they are handled. For example, \xdc is exactly the same
        as \x{dc} (or \u00dc in JavaScript mode).
 
    Constraints on character values
 
-       Characters  that  are  specified using octal or hexadecimal numbers are
+       Characters that are specified using octal or  hexadecimal  numbers  are
        limited to certain values, as follows:
 
          8-bit non-UTF mode    less than 0x100
@@ -5116,44 +5129,44 @@ BACKSLASH
          32-bit non-UTF mode   less than 0x100000000
          32-bit UTF-32 mode    less than 0x10ffff and a valid codepoint
 
-       Invalid Unicode codepoints are the range  0xd800  to  0xdfff  (the  so-
+       Invalid  Unicode  codepoints  are  the  range 0xd800 to 0xdfff (the so-
        called "surrogate" codepoints), and 0xffef.
 
    Escape sequences in character classes
 
        All the sequences that define a single character value can be used both
-       inside and outside character classes. In addition, inside  a  character
+       inside  and  outside character classes. In addition, inside a character
        class, \b is interpreted as the backspace character (hex 08).
 
-       \N  is not allowed in a character class. \B, \R, and \X are not special
-       inside a character class. Like  other  unrecognized  escape  sequences,
-       they  are  treated  as  the  literal  characters  "B",  "R", and "X" by
-       default, but cause an error if the PCRE_EXTRA option is set. Outside  a
+       \N is not allowed in a character class. \B, \R, and \X are not  special
+       inside  a  character  class.  Like other unrecognized escape sequences,
+       they are treated as  the  literal  characters  "B",  "R",  and  "X"  by
+       default,  but cause an error if the PCRE_EXTRA option is set. Outside a
        character class, these sequences have different meanings.
 
    Unsupported escape sequences
 
-       In  Perl, the sequences \l, \L, \u, and \U are recognized by its string
-       handler and used  to  modify  the  case  of  following  characters.  By
-       default,  PCRE does not support these escape sequences. However, if the
-       PCRE_JAVASCRIPT_COMPAT option is set, \U matches a "U"  character,  and
+       In Perl, the sequences \l, \L, \u, and \U are recognized by its  string
+       handler  and  used  to  modify  the  case  of  following characters. By
+       default, PCRE does not support these escape sequences. However, if  the
+       PCRE_JAVASCRIPT_COMPAT  option  is set, \U matches a "U" character, and
        \u can be used to define a character by code point, as described in the
        previous section.
 
    Absolute and relative back references
 
-       The sequence \g followed by an unsigned or a negative  number,  option-
-       ally  enclosed  in braces, is an absolute or relative back reference. A
+       The  sequence  \g followed by an unsigned or a negative number, option-
+       ally enclosed in braces, is an absolute or relative back  reference.  A
        named back reference can be coded as \g{name}. Back references are dis-
        cussed later, following the discussion of parenthesized subpatterns.
 
    Absolute and relative subroutine calls
 
-       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
+       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
        name or a number enclosed either in angle brackets or single quotes, is
-       an  alternative  syntax for referencing a subpattern as a "subroutine".
-       Details are discussed later.   Note  that  \g{...}  (Perl  syntax)  and
-       \g<...>  (Oniguruma  syntax)  are  not synonymous. The former is a back
+       an alternative syntax for referencing a subpattern as  a  "subroutine".
+       Details  are  discussed  later.   Note  that  \g{...} (Perl syntax) and
+       \g<...> (Oniguruma syntax) are not synonymous. The  former  is  a  back
        reference; the latter is a subroutine call.
 
    Generic character types
@@ -5172,59 +5185,59 @@ BACKSLASH
          \W     any "non-word" character
 
        There is also the single sequence \N, which matches a non-newline char-
-       acter.   This  is the same as the "." metacharacter when PCRE_DOTALL is
-       not set. Perl also uses \N to match characters by name; PCRE  does  not
+       acter.  This is the same as the "." metacharacter when  PCRE_DOTALL  is
+       not  set.  Perl also uses \N to match characters by name; PCRE does not
        support this.
 
-       Each  pair of lower and upper case escape sequences partitions the com-
-       plete set of characters into two disjoint  sets.  Any  given  character
-       matches  one, and only one, of each pair. The sequences can appear both
-       inside and outside character classes. They each match one character  of
-       the  appropriate  type.  If the current matching point is at the end of
-       the subject string, all of them fail, because there is no character  to
+       Each pair of lower and upper case escape sequences partitions the  com-
+       plete  set  of  characters  into two disjoint sets. Any given character
+       matches one, and only one, of each pair. The sequences can appear  both
+       inside  and outside character classes. They each match one character of
+       the appropriate type. If the current matching point is at  the  end  of
+       the  subject string, all of them fail, because there is no character to
        match.
 
-       For  compatibility with Perl, \s did not used to match the VT character
-       (code 11), which made it different from the the  POSIX  "space"  class.
-       However,  Perl  added  VT  at  release  5.18, and PCRE followed suit at
-       release 8.34. The default \s characters are now HT  (9),  LF  (10),  VT
-       (11),  FF  (12),  CR  (13),  and space (32), which are defined as white
+       For compatibility with Perl, \s did not used to match the VT  character
+       (code  11),  which  made it different from the the POSIX "space" class.
+       However, Perl added VT at release  5.18,  and  PCRE  followed  suit  at
+       release  8.34.  The  default  \s characters are now HT (9), LF (10), VT
+       (11), FF (12), CR (13), and space (32),  which  are  defined  as  white
        space in the "C" locale. This list may vary if locale-specific matching
-       is  taking place. For example, in some locales the "non-breaking space"
-       character (\xA0) is recognized as white space, and  in  others  the  VT
+       is taking place. For example, in some locales the "non-breaking  space"
+       character  (\xA0)  is  recognized  as white space, and in others the VT
        character is not.
 
-       A  "word"  character is an underscore or any character that is a letter
-       or digit.  By default, the definition of letters  and  digits  is  con-
-       trolled  by PCRE's low-valued character tables, and may vary if locale-
-       specific matching is taking place (see "Locale support" in the  pcreapi
-       page).  For  example,  in  a French locale such as "fr_FR" in Unix-like
-       systems, or "french" in Windows, some character codes greater than  127
-       are  used  for  accented letters, and these are then matched by \w. The
+       A "word" character is an underscore or any character that is  a  letter
+       or  digit.   By  default,  the definition of letters and digits is con-
+       trolled by PCRE's low-valued character tables, and may vary if  locale-
+       specific  matching is taking place (see "Locale support" in the pcreapi
+       page). For example, in a French locale such  as  "fr_FR"  in  Unix-like
+       systems,  or "french" in Windows, some character codes greater than 127
+       are used for accented letters, and these are then matched  by  \w.  The
        use of locales with Unicode is discouraged.
 
-       By default, characters whose code points are  greater  than  127  never
+       By  default,  characters  whose  code points are greater than 127 never
        match \d, \s, or \w, and always match \D, \S, and \W, although this may
-       vary for characters in the range 128-255 when locale-specific  matching
-       is  happening.   These  escape sequences retain their original meanings
-       from before Unicode support was available, mainly for  efficiency  rea-
-       sons.  If  PCRE  is  compiled  with  Unicode  property support, and the
-       PCRE_UCP option is set, the behaviour is changed so that Unicode  prop-
+       vary  for characters in the range 128-255 when locale-specific matching
+       is happening.  These escape sequences retain  their  original  meanings
+       from  before  Unicode support was available, mainly for efficiency rea-
+       sons. If PCRE is  compiled  with  Unicode  property  support,  and  the
+       PCRE_UCP  option is set, the behaviour is changed so that Unicode prop-
        erties are used to determine character types, as follows:
 
          \d  any character that matches \p{Nd} (decimal digit)
          \s  any character that matches \p{Z} or \h or \v
          \w  any character that matches \p{L} or \p{N}, plus underscore
 
-       The  upper case escapes match the inverse sets of characters. Note that
-       \d matches only decimal digits, whereas \w matches any  Unicode  digit,
-       as  well as any Unicode letter, and underscore. Note also that PCRE_UCP
-       affects \b, and \B because they are defined in  terms  of  \w  and  \W.
+       The upper case escapes match the inverse sets of characters. Note  that
+       \d  matches  only decimal digits, whereas \w matches any Unicode digit,
+       as well as any Unicode letter, and underscore. Note also that  PCRE_UCP
+       affects  \b,  and  \B  because  they are defined in terms of \w and \W.
        Matching these sequences is noticeably slower when PCRE_UCP is set.
 
-       The  sequences  \h, \H, \v, and \V are features that were added to Perl
-       at release 5.10. In contrast to the other sequences, which  match  only
-       ASCII  characters  by  default,  these always match certain high-valued
+       The sequences \h, \H, \v, and \V are features that were added  to  Perl
+       at  release  5.10. In contrast to the other sequences, which match only
+       ASCII characters by default, these  always  match  certain  high-valued
        code points, whether or not PCRE_UCP is set. The horizontal space char-
        acters are:
 
@@ -5263,110 +5276,110 @@ BACKSLASH
 
    Newline sequences
 
-       Outside a character class, by default, the escape sequence  \R  matches
-       any  Unicode newline sequence. In 8-bit non-UTF-8 mode \R is equivalent
+       Outside  a  character class, by default, the escape sequence \R matches
+       any Unicode newline sequence. In 8-bit non-UTF-8 mode \R is  equivalent
        to the following:
 
          (?>\r\n|\n|\x0b|\f|\r|\x85)
 
-       This is an example of an "atomic group", details  of  which  are  given
+       This  is  an  example  of an "atomic group", details of which are given
        below.  This particular group matches either the two-character sequence
-       CR followed by LF, or  one  of  the  single  characters  LF  (linefeed,
-       U+000A),  VT  (vertical  tab, U+000B), FF (form feed, U+000C), CR (car-
-       riage return, U+000D), or NEL (next line,  U+0085).  The  two-character
+       CR  followed  by  LF,  or  one  of  the single characters LF (linefeed,
+       U+000A), VT (vertical tab, U+000B), FF (form feed,  U+000C),  CR  (car-
+       riage  return,  U+000D),  or NEL (next line, U+0085). The two-character
        sequence is treated as a single unit that cannot be split.
 
-       In  other modes, two additional characters whose codepoints are greater
+       In other modes, two additional characters whose codepoints are  greater
        than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
-       rator,  U+2029).   Unicode character property support is not needed for
+       rator, U+2029).  Unicode character property support is not  needed  for
        these characters to be recognized.
 
        It is possible to restrict \R to match only CR, LF, or CRLF (instead of
-       the  complete  set  of  Unicode  line  endings)  by  setting the option
+       the complete set  of  Unicode  line  endings)  by  setting  the  option
        PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched.
        (BSR is an abbrevation for "backslash R".) This can be made the default
-       when PCRE is built; if this is the case, the  other  behaviour  can  be
-       requested  via  the  PCRE_BSR_UNICODE  option.   It is also possible to
-       specify these settings by starting a pattern string  with  one  of  the
+       when  PCRE  is  built;  if this is the case, the other behaviour can be
+       requested via the PCRE_BSR_UNICODE option.   It  is  also  possible  to
+       specify  these  settings  by  starting a pattern string with one of the
        following sequences:
 
          (*BSR_ANYCRLF)   CR, LF, or CRLF only
          (*BSR_UNICODE)   any Unicode newline sequence
 
        These override the default and the options given to the compiling func-
-       tion, but they can themselves be  overridden  by  options  given  to  a
-       matching  function.  Note  that  these  special settings, which are not
-       Perl-compatible, are recognized only at the very start  of  a  pattern,
-       and  that  they  must  be  in  upper  case. If more than one of them is
-       present, the last one is used. They can be combined with  a  change  of
+       tion,  but  they  can  themselves  be  overridden by options given to a
+       matching function. Note that these  special  settings,  which  are  not
+       Perl-compatible,  are  recognized  only at the very start of a pattern,
+       and that they must be in upper case.  If  more  than  one  of  them  is
+       present,  the  last  one is used. They can be combined with a change of
        newline convention; for example, a pattern can start with:
 
          (*ANY)(*BSR_ANYCRLF)
 
-       They  can also be combined with the (*UTF8), (*UTF16), (*UTF32), (*UTF)
+       They can also be combined with the (*UTF8), (*UTF16), (*UTF32),  (*UTF)
        or (*UCP) special sequences. Inside a character class, \R is treated as
-       an  unrecognized  escape  sequence,  and  so  matches the letter "R" by
+       an unrecognized escape sequence, and  so  matches  the  letter  "R"  by
        default, but causes an error if PCRE_EXTRA is set.
 
    Unicode character properties
 
        When PCRE is built with Unicode character property support, three addi-
-       tional  escape sequences that match characters with specific properties
-       are available.  When in 8-bit non-UTF-8 mode, these  sequences  are  of
-       course  limited  to  testing  characters whose codepoints are less than
+       tional escape sequences that match characters with specific  properties
+       are  available.   When  in 8-bit non-UTF-8 mode, these sequences are of
+       course limited to testing characters whose  codepoints  are  less  than
        256, but they do work in this mode.  The extra escape sequences are:
 
          \p{xx}   a character with the xx property
          \P{xx}   a character without the xx property
          \X       a Unicode extended grapheme cluster
 
-       The property names represented by xx above are limited to  the  Unicode
+       The  property  names represented by xx above are limited to the Unicode
        script names, the general category properties, "Any", which matches any
-       character  (including  newline),  and  some  special  PCRE   properties
-       (described  in the next section).  Other Perl properties such as "InMu-
-       sicalSymbols" are not currently supported by PCRE.  Note  that  \P{Any}
+       character   (including  newline),  and  some  special  PCRE  properties
+       (described in the next section).  Other Perl properties such as  "InMu-
+       sicalSymbols"  are  not  currently supported by PCRE. Note that \P{Any}
        does not match any characters, so always causes a match failure.
 
        Sets of Unicode characters are defined as belonging to certain scripts.
-       A character from one of these sets can be matched using a script  name.
+       A  character from one of these sets can be matched using a script name.
        For example:
 
          \p{Greek}
          \P{Han}
 
-       Those  that are not part of an identified script are lumped together as
+       Those that are not part of an identified script are lumped together  as
        "Common". The current list of scripts is:
 
-       Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak,  Bengali,
-       Bopomofo,  Brahmi,  Braille, Buginese, Buhid, Canadian_Aboriginal, Car-
+       Arabic,  Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali,
+       Bopomofo, Brahmi, Braille, Buginese, Buhid,  Canadian_Aboriginal,  Car-
        ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei-
        form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero-
        glyphs,  Elbasan,  Ethiopic,  Georgian,  Glagolitic,  Gothic,  Grantha,
-       Greek,  Gujarati,  Gurmukhi,  Han,  Hangul,  Hanunoo, Hebrew, Hiragana,
-       Imperial_Aramaic,    Inherited,     Inscriptional_Pahlavi,     Inscrip-
-       tional_Parthian,   Javanese,   Kaithi,   Kannada,  Katakana,  Kayah_Li,
-       Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha,  Limbu,  Lin-
-       ear_A,  Linear_B,  Lisu,  Lycian, Lydian, Mahajani, Malayalam, Mandaic,
-       Manichaean,     Meetei_Mayek,     Mende_Kikakui,      Meroitic_Cursive,
-       Meroitic_Hieroglyphs,  Miao,  Modi, Mongolian, Mro, Myanmar, Nabataean,
-       New_Tai_Lue,  Nko,  Ogham,  Ol_Chiki,  Old_Italic,   Old_North_Arabian,
+       Greek, Gujarati, Gurmukhi,  Han,  Hangul,  Hanunoo,  Hebrew,  Hiragana,
+       Imperial_Aramaic,     Inherited,     Inscriptional_Pahlavi,    Inscrip-
+       tional_Parthian,  Javanese,  Kaithi,   Kannada,   Katakana,   Kayah_Li,
+       Kharoshthi,  Khmer,  Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin-
+       ear_A, Linear_B, Lisu, Lycian, Lydian,  Mahajani,  Malayalam,  Mandaic,
+       Manichaean,      Meetei_Mayek,     Mende_Kikakui,     Meroitic_Cursive,
+       Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro,  Myanmar,  Nabataean,
+       New_Tai_Lue,   Nko,  Ogham,  Ol_Chiki,  Old_Italic,  Old_North_Arabian,
        Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya,
        Pahawh_Hmong,    Palmyrene,    Pau_Cin_Hau,    Phags_Pa,    Phoenician,
-       Psalter_Pahlavi,  Rejang,  Runic,  Samaritan, Saurashtra, Sharada, Sha-
-       vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri,  Syriac,
-       Tagalog,  Tagbanwa,  Tai_Le,  Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
-       Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic,  Vai,  Warang_Citi,
+       Psalter_Pahlavi, Rejang, Runic, Samaritan,  Saurashtra,  Sharada,  Sha-
+       vian,  Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac,
+       Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,  Takri,  Tamil,  Telugu,
+       Thaana,  Thai,  Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi,
        Yi.
 
        Each character has exactly one Unicode general category property, spec-
-       ified by a two-letter abbreviation. For compatibility with Perl,  nega-
-       tion  can  be  specified  by including a circumflex between the opening
-       brace and the property name.  For  example,  \p{^Lu}  is  the  same  as
+       ified  by a two-letter abbreviation. For compatibility with Perl, nega-
+       tion can be specified by including a  circumflex  between  the  opening
+       brace  and  the  property  name.  For  example,  \p{^Lu} is the same as
        \P{Lu}.
 
        If only one letter is specified with \p or \P, it includes all the gen-
-       eral category properties that start with that letter. In this case,  in
-       the  absence of negation, the curly brackets in the escape sequence are
+       eral  category properties that start with that letter. In this case, in
+       the absence of negation, the curly brackets in the escape sequence  are
        optional; these two examples have the same effect:
 
          \p{L}
@@ -5418,73 +5431,73 @@ BACKSLASH
          Zp    Paragraph separator
          Zs    Space separator
 
-       The special property L& is also supported: it matches a character  that
-       has  the  Lu,  Ll, or Lt property, in other words, a letter that is not
+       The  special property L& is also supported: it matches a character that
+       has the Lu, Ll, or Lt property, in other words, a letter  that  is  not
        classified as a modifier or "other".
 
-       The Cs (Surrogate) property applies only to  characters  in  the  range
-       U+D800  to U+DFFF. Such characters are not valid in Unicode strings and
-       so cannot be tested by PCRE, unless  UTF  validity  checking  has  been
+       The  Cs  (Surrogate)  property  applies only to characters in the range
+       U+D800 to U+DFFF. Such characters are not valid in Unicode strings  and
+       so  cannot  be  tested  by  PCRE, unless UTF validity checking has been
        turned    off    (see    the    discussion    of    PCRE_NO_UTF8_CHECK,
-       PCRE_NO_UTF16_CHECK and PCRE_NO_UTF32_CHECK in the pcreapi page).  Perl
+       PCRE_NO_UTF16_CHECK  and PCRE_NO_UTF32_CHECK in the pcreapi page). Perl
        does not support the Cs property.
 
-       The  long  synonyms  for  property  names  that  Perl supports (such as
-       \p{Letter}) are not supported by PCRE, nor is it  permitted  to  prefix
+       The long synonyms for  property  names  that  Perl  supports  (such  as
+       \p{Letter})  are  not  supported by PCRE, nor is it permitted to prefix
        any of these properties with "Is".
 
        No character that is in the Unicode table has the Cn (unassigned) prop-
        erty.  Instead, this property is assumed for any code point that is not
        in the Unicode table.
 
-       Specifying  caseless  matching  does not affect these escape sequences.
-       For example, \p{Lu} always matches only upper  case  letters.  This  is
+       Specifying caseless matching does not affect  these  escape  sequences.
+       For  example,  \p{Lu}  always  matches only upper case letters. This is
        different from the behaviour of current versions of Perl.
 
-       Matching  characters  by Unicode property is not fast, because PCRE has
-       to do a multistage table lookup in order to find  a  character's  prop-
+       Matching characters by Unicode property is not fast, because  PCRE  has
+       to  do  a  multistage table lookup in order to find a character's prop-
        erty. That is why the traditional escape sequences such as \d and \w do
        not use Unicode properties in PCRE by default, though you can make them
-       do  so  by  setting the PCRE_UCP option or by starting the pattern with
+       do so by setting the PCRE_UCP option or by starting  the  pattern  with
        (*UCP).
 
    Extended grapheme clusters
 
-       The \X escape matches any number of Unicode  characters  that  form  an
+       The  \X  escape  matches  any number of Unicode characters that form an
        "extended grapheme cluster", and treats the sequence as an atomic group
-       (see below).  Up to and including release 8.31, PCRE  matched  an  ear-
+       (see  below).   Up  to and including release 8.31, PCRE matched an ear-
        lier, simpler definition that was equivalent to
 
          (?>\PM\pM*)
 
-       That  is,  it matched a character without the "mark" property, followed
-       by zero or more characters with the "mark"  property.  Characters  with
-       the  "mark"  property are typically non-spacing accents that affect the
+       That is, it matched a character without the "mark"  property,  followed
+       by  zero  or  more characters with the "mark" property. Characters with
+       the "mark" property are typically non-spacing accents that  affect  the
        preceding character.
 
-       This simple definition was extended in Unicode to include more  compli-
-       cated  kinds of composite character by giving each character a grapheme
-       breaking property, and creating rules  that  use  these  properties  to
-       define  the  boundaries  of  extended grapheme clusters. In releases of
+       This  simple definition was extended in Unicode to include more compli-
+       cated kinds of composite character by giving each character a  grapheme
+       breaking  property,  and  creating  rules  that use these properties to
+       define the boundaries of extended grapheme  clusters.  In  releases  of
        PCRE later than 8.31, \X matches one of these clusters.
 
-       \X always matches at least one character. Then it  decides  whether  to
+       \X  always  matches  at least one character. Then it decides whether to
        add additional characters according to the following rules for ending a
        cluster:
 
        1. End at the end of the subject string.
 
-       2. Do not end between CR and LF; otherwise end after any control  char-
+       2.  Do not end between CR and LF; otherwise end after any control char-
        acter.
 
-       3.  Do  not  break  Hangul (a Korean script) syllable sequences. Hangul
-       characters are of five types: L, V, T, LV, and LVT. An L character  may
-       be  followed by an L, V, LV, or LVT character; an LV or V character may
+       3. Do not break Hangul (a Korean  script)  syllable  sequences.  Hangul
+       characters  are of five types: L, V, T, LV, and LVT. An L character may
+       be followed by an L, V, LV, or LVT character; an LV or V character  may
        be followed by a V or T character; an LVT or T character may be follwed
        only by a T character.
 
-       4.  Do not end before extending characters or spacing marks. Characters
-       with the "mark" property always have  the  "extend"  grapheme  breaking
+       4. Do not end before extending characters or spacing marks.  Characters
+       with  the  "mark"  property  always have the "extend" grapheme breaking
        property.
 
        5. Do not end after prepend characters.
@@ -5493,9 +5506,9 @@ BACKSLASH
 
    PCRE's additional properties
 
-       As  well  as the standard Unicode properties described above, PCRE sup-
-       ports four more that make it possible  to  convert  traditional  escape
-       sequences  such as \w and \s to use Unicode properties. PCRE uses these
+       As well as the standard Unicode properties described above,  PCRE  sup-
+       ports  four  more  that  make it possible to convert traditional escape
+       sequences such as \w and \s to use Unicode properties. PCRE uses  these
        non-standard, non-Perl properties internally when PCRE_UCP is set. How-
        ever, they may also be used explicitly. These properties are:
 
@@ -5504,54 +5517,54 @@ BACKSLASH
          Xsp   Any Perl space character
          Xwd   Any Perl "word" character
 
-       Xan  matches  characters that have either the L (letter) or the N (num-
-       ber) property. Xps matches the characters tab, linefeed, vertical  tab,
-       form  feed,  or carriage return, and any other character that has the Z
-       (separator) property.  Xsp is the same as Xps; it used to exclude  ver-
-       tical  tab,  for Perl compatibility, but Perl changed, and so PCRE fol-
-       lowed at release 8.34. Xwd matches the same  characters  as  Xan,  plus
+       Xan matches characters that have either the L (letter) or the  N  (num-
+       ber)  property. Xps matches the characters tab, linefeed, vertical tab,
+       form feed, or carriage return, and any other character that has  the  Z
+       (separator)  property.  Xsp is the same as Xps; it used to exclude ver-
+       tical tab, for Perl compatibility, but Perl changed, and so  PCRE  fol-
+       lowed  at  release  8.34.  Xwd matches the same characters as Xan, plus
        underscore.
 
-       There  is another non-standard property, Xuc, which matches any charac-
-       ter that can be represented by a Universal Character Name  in  C++  and
-       other  programming  languages.  These are the characters $, @, ` (grave
-       accent), and all characters with Unicode code points  greater  than  or
-       equal  to U+00A0, except for the surrogates U+D800 to U+DFFF. Note that
-       most base (ASCII) characters are excluded. (Universal  Character  Names
-       are  of  the  form \uHHHH or \UHHHHHHHH where H is a hexadecimal digit.
+       There is another non-standard property, Xuc, which matches any  charac-
+       ter  that  can  be represented by a Universal Character Name in C++ and
+       other programming languages. These are the characters $,  @,  `  (grave
+       accent),  and  all  characters with Unicode code points greater than or
+       equal to U+00A0, except for the surrogates U+D800 to U+DFFF. Note  that
+       most  base  (ASCII) characters are excluded. (Universal Character Names
+       are of the form \uHHHH or \UHHHHHHHH where H is  a  hexadecimal  digit.
        Note that the Xuc property does not match these sequences but the char-
        acters that they represent.)
 
    Resetting the match start
 
-       The  escape sequence \K causes any previously matched characters not to
+       The escape sequence \K causes any previously matched characters not  to
        be included in the final matched sequence. For example, the pattern:
 
          foo\Kbar
 
-       matches "foobar", but reports that it has matched "bar".  This  feature
-       is  similar  to  a lookbehind assertion (described below).  However, in
-       this case, the part of the subject before the real match does not  have
-       to  be of fixed length, as lookbehind assertions do. The use of \K does
-       not interfere with the setting of captured  substrings.   For  example,
+       matches  "foobar",  but reports that it has matched "bar". This feature
+       is similar to a lookbehind assertion (described  below).   However,  in
+       this  case, the part of the subject before the real match does not have
+       to be of fixed length, as lookbehind assertions do. The use of \K  does
+       not  interfere  with  the setting of captured substrings.  For example,
        when the pattern
 
          (foo)\Kbar
 
        matches "foobar", the first substring is still set to "foo".
 
-       Perl  documents  that  the  use  of  \K  within assertions is "not well
-       defined". In PCRE, \K is acted upon  when  it  occurs  inside  positive
-       assertions,  but  is  ignored  in negative assertions. Note that when a
-       pattern such as (?=ab\K) matches, the reported start of the  match  can
+       Perl documents that the use  of  \K  within  assertions  is  "not  well
+       defined".  In  PCRE,  \K  is  acted upon when it occurs inside positive
+       assertions, but is ignored in negative assertions.  Note  that  when  a
+       pattern  such  as (?=ab\K) matches, the reported start of the match can
        be greater than the end of the match.
 
    Simple assertions
 
-       The  final use of backslash is for certain simple assertions. An asser-
-       tion specifies a condition that has to be met at a particular point  in
-       a  match, without consuming any characters from the subject string. The
-       use of subpatterns for more complicated assertions is described  below.
+       The final use of backslash is for certain simple assertions. An  asser-
+       tion  specifies a condition that has to be met at a particular point in
+       a match, without consuming any characters from the subject string.  The
+       use  of subpatterns for more complicated assertions is described below.
        The backslashed assertions are:
 
          \b     matches at a word boundary
@@ -5562,161 +5575,161 @@ BACKSLASH
          \z     matches only at the end of the subject
          \G     matches at the first matching position in the subject
 
-       Inside  a  character  class, \b has a different meaning; it matches the
-       backspace character. If any other of  these  assertions  appears  in  a
-       character  class, by default it matches the corresponding literal char-
+       Inside a character class, \b has a different meaning;  it  matches  the
+       backspace  character.  If  any  other  of these assertions appears in a
+       character class, by default it matches the corresponding literal  char-
        acter  (for  example,  \B  matches  the  letter  B).  However,  if  the
-       PCRE_EXTRA  option is set, an "invalid escape sequence" error is gener-
+       PCRE_EXTRA option is set, an "invalid escape sequence" error is  gener-
        ated instead.
 
-       A word boundary is a position in the subject string where  the  current
-       character  and  the previous character do not both match \w or \W (i.e.
-       one matches \w and the other matches \W), or the start or  end  of  the
-       string  if  the  first or last character matches \w, respectively. In a
-       UTF mode, the meanings of \w and \W  can  be  changed  by  setting  the
-       PCRE_UCP  option. When this is done, it also affects \b and \B. Neither
-       PCRE nor Perl has a separate "start of word" or "end of  word"  metase-
-       quence.  However,  whatever follows \b normally determines which it is.
+       A  word  boundary is a position in the subject string where the current
+       character and the previous character do not both match \w or  \W  (i.e.
+       one  matches  \w  and the other matches \W), or the start or end of the
+       string if the first or last character matches \w,  respectively.  In  a
+       UTF  mode,  the  meanings  of  \w  and \W can be changed by setting the
+       PCRE_UCP option. When this is done, it also affects \b and \B.  Neither
+       PCRE  nor  Perl has a separate "start of word" or "end of word" metase-
+       quence. However, whatever follows \b normally determines which  it  is.
        For example, the fragment \ba matches "a" at the start of a word.
 
-       The \A, \Z, and \z assertions differ from  the  traditional  circumflex
+       The  \A,  \Z,  and \z assertions differ from the traditional circumflex
        and dollar (described in the next section) in that they only ever match
-       at the very start and end of the subject string, whatever  options  are
-       set.  Thus,  they are independent of multiline mode. These three asser-
+       at  the  very start and end of the subject string, whatever options are
+       set. Thus, they are independent of multiline mode. These  three  asser-
        tions are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options, which
-       affect  only the behaviour of the circumflex and dollar metacharacters.
-       However, if the startoffset argument of pcre_exec() is non-zero,  indi-
+       affect only the behaviour of the circumflex and dollar  metacharacters.
+       However,  if the startoffset argument of pcre_exec() is non-zero, indi-
        cating that matching is to start at a point other than the beginning of
-       the subject, \A can never match. The difference between \Z  and  \z  is
+       the  subject,  \A  can never match. The difference between \Z and \z is
        that \Z matches before a newline at the end of the string as well as at
        the very end, whereas \z matches only at the end.
 
-       The \G assertion is true only when the current matching position is  at
-       the  start point of the match, as specified by the startoffset argument
-       of pcre_exec(). It differs from \A when the  value  of  startoffset  is
-       non-zero.  By calling pcre_exec() multiple times with appropriate argu-
+       The  \G assertion is true only when the current matching position is at
+       the start point of the match, as specified by the startoffset  argument
+       of  pcre_exec().  It  differs  from \A when the value of startoffset is
+       non-zero. By calling pcre_exec() multiple times with appropriate  argu-
        ments, you can mimic Perl's /g option, and it is in this kind of imple-
        mentation where \G can be useful.
 
-       Note,  however,  that  PCRE's interpretation of \G, as the start of the
+       Note, however, that PCRE's interpretation of \G, as the  start  of  the
        current match, is subtly different from Perl's, which defines it as the
-       end  of  the  previous  match. In Perl, these can be different when the
-       previously matched string was empty. Because PCRE does just  one  match
+       end of the previous match. In Perl, these can  be  different  when  the
+       previously  matched  string was empty. Because PCRE does just one match
        at a time, it cannot reproduce this behaviour.
 
-       If  all  the alternatives of a pattern begin with \G, the expression is
+       If all the alternatives of a pattern begin with \G, the  expression  is
        anchored to the starting match position, and the "anchored" flag is set
        in the compiled regular expression.
 
 
 CIRCUMFLEX AND DOLLAR
 
-       The  circumflex  and  dollar  metacharacters are zero-width assertions.
-       That is, they test for a particular condition being true  without  con-
+       The circumflex and dollar  metacharacters  are  zero-width  assertions.
+       That  is,  they test for a particular condition being true without con-
        suming any characters from the subject string.
 
        Outside a character class, in the default matching mode, the circumflex
-       character is an assertion that is true only  if  the  current  matching
-       point  is  at the start of the subject string. If the startoffset argu-
-       ment of pcre_exec() is non-zero, circumflex  can  never  match  if  the
-       PCRE_MULTILINE  option  is  unset. Inside a character class, circumflex
+       character  is  an  assertion  that is true only if the current matching
+       point is at the start of the subject string. If the  startoffset  argu-
+       ment  of  pcre_exec()  is  non-zero,  circumflex can never match if the
+       PCRE_MULTILINE option is unset. Inside a  character  class,  circumflex
        has an entirely different meaning (see below).
 
-       Circumflex need not be the first character of the pattern if  a  number
-       of  alternatives are involved, but it should be the first thing in each
-       alternative in which it appears if the pattern is ever  to  match  that
-       branch.  If all possible alternatives start with a circumflex, that is,
-       if the pattern is constrained to match only at the start  of  the  sub-
-       ject,  it  is  said  to be an "anchored" pattern. (There are also other
+       Circumflex  need  not be the first character of the pattern if a number
+       of alternatives are involved, but it should be the first thing in  each
+       alternative  in  which  it appears if the pattern is ever to match that
+       branch. If all possible alternatives start with a circumflex, that  is,
+       if  the  pattern  is constrained to match only at the start of the sub-
+       ject, it is said to be an "anchored" pattern.  (There  are  also  other
        constructs that can cause a pattern to be anchored.)
 
-       The dollar character is an assertion that is true only if  the  current
-       matching  point  is  at  the  end of the subject string, or immediately
-       before a newline at the end of the string (by default). Note,  however,
-       that  it  does  not  actually match the newline. Dollar need not be the
+       The  dollar  character is an assertion that is true only if the current
+       matching point is at the end of  the  subject  string,  or  immediately
+       before  a newline at the end of the string (by default). Note, however,
+       that it does not actually match the newline. Dollar  need  not  be  the
        last character of the pattern if a number of alternatives are involved,
-       but  it should be the last item in any branch in which it appears. Dol-
+       but it should be the last item in any branch in which it appears.  Dol-
        lar has no special meaning in a character class.
 
-       The meaning of dollar can be changed so that it  matches  only  at  the
-       very  end  of  the string, by setting the PCRE_DOLLAR_ENDONLY option at
+       The  meaning  of  dollar  can be changed so that it matches only at the
+       very end of the string, by setting the  PCRE_DOLLAR_ENDONLY  option  at
        compile time. This does not affect the \Z assertion.
 
        The meanings of the circumflex and dollar characters are changed if the
-       PCRE_MULTILINE  option  is  set.  When  this  is the case, a circumflex
-       matches immediately after internal newlines as well as at the start  of
-       the  subject  string.  It  does not match after a newline that ends the
-       string. A dollar matches before any newlines in the string, as well  as
-       at  the very end, when PCRE_MULTILINE is set. When newline is specified
-       as the two-character sequence CRLF, isolated CR and  LF  characters  do
+       PCRE_MULTILINE option is set. When  this  is  the  case,  a  circumflex
+       matches  immediately after internal newlines as well as at the start of
+       the subject string. It does not match after a  newline  that  ends  the
+       string.  A dollar matches before any newlines in the string, as well as
+       at the very end, when PCRE_MULTILINE is set. When newline is  specified
+       as  the  two-character  sequence CRLF, isolated CR and LF characters do
        not indicate newlines.
 
-       For  example, the pattern /^abc$/ matches the subject string "def\nabc"
-       (where \n represents a newline) in multiline mode, but  not  otherwise.
-       Consequently,  patterns  that  are anchored in single line mode because
-       all branches start with ^ are not anchored in  multiline  mode,  and  a
-       match  for  circumflex  is  possible  when  the startoffset argument of
-       pcre_exec() is non-zero. The PCRE_DOLLAR_ENDONLY option is  ignored  if
+       For example, the pattern /^abc$/ matches the subject string  "def\nabc"
+       (where  \n  represents a newline) in multiline mode, but not otherwise.
+       Consequently, patterns that are anchored in single  line  mode  because
+       all  branches  start  with  ^ are not anchored in multiline mode, and a
+       match for circumflex is  possible  when  the  startoffset  argument  of
+       pcre_exec()  is  non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if
        PCRE_MULTILINE is set.
 
-       Note  that  the sequences \A, \Z, and \z can be used to match the start
-       and end of the subject in both modes, and if all branches of a  pattern
-       start  with  \A it is always anchored, whether or not PCRE_MULTILINE is
+       Note that the sequences \A, \Z, and \z can be used to match  the  start
+       and  end of the subject in both modes, and if all branches of a pattern
+       start with \A it is always anchored, whether or not  PCRE_MULTILINE  is
        set.
 
 
 FULL STOP (PERIOD, DOT) AND \N
 
        Outside a character class, a dot in the pattern matches any one charac-
-       ter  in  the subject string except (by default) a character that signi-
+       ter in the subject string except (by default) a character  that  signi-
        fies the end of a line.
 
-       When a line ending is defined as a single character, dot never  matches
-       that  character; when the two-character sequence CRLF is used, dot does
-       not match CR if it is immediately followed  by  LF,  but  otherwise  it
-       matches  all characters (including isolated CRs and LFs). When any Uni-
-       code line endings are being recognized, dot does not match CR or LF  or
+       When  a line ending is defined as a single character, dot never matches
+       that character; when the two-character sequence CRLF is used, dot  does
+       not  match  CR  if  it  is immediately followed by LF, but otherwise it
+       matches all characters (including isolated CRs and LFs). When any  Uni-
+       code  line endings are being recognized, dot does not match CR or LF or
        any of the other line ending characters.
 
-       The  behaviour  of  dot  with regard to newlines can be changed. If the
-       PCRE_DOTALL option is set, a dot matches  any  one  character,  without
+       The behaviour of dot with regard to newlines can  be  changed.  If  the
+       PCRE_DOTALL  option  is  set,  a dot matches any one character, without
        exception. If the two-character sequence CRLF is present in the subject
        string, it takes two dots to match it.
 
-       The handling of dot is entirely independent of the handling of  circum-
-       flex  and  dollar,  the  only relationship being that they both involve
+       The  handling of dot is entirely independent of the handling of circum-
+       flex and dollar, the only relationship being  that  they  both  involve
        newlines. Dot has no special meaning in a character class.
 
-       The escape sequence \N behaves like  a  dot,  except  that  it  is  not
-       affected  by  the  PCRE_DOTALL  option.  In other words, it matches any
-       character except one that signifies the end of a line. Perl  also  uses
+       The  escape  sequence  \N  behaves  like  a  dot, except that it is not
+       affected by the PCRE_DOTALL option. In  other  words,  it  matches  any
+       character  except  one that signifies the end of a line. Perl also uses
        \N to match characters by name; PCRE does not support this.
 
 
 MATCHING A SINGLE DATA UNIT
 
-       Outside  a character class, the escape sequence \C matches any one data
-       unit, whether or not a UTF mode is set. In the 8-bit library, one  data
-       unit  is  one  byte;  in the 16-bit library it is a 16-bit unit; in the
-       32-bit library it is a 32-bit unit. Unlike a  dot,  \C  always  matches
-       line-ending  characters.  The  feature  is provided in Perl in order to
+       Outside a character class, the escape sequence \C matches any one  data
+       unit,  whether or not a UTF mode is set. In the 8-bit library, one data
+       unit is one byte; in the 16-bit library it is a  16-bit  unit;  in  the
+       32-bit  library  it  is  a 32-bit unit. Unlike a dot, \C always matches
+       line-ending characters. The feature is provided in  Perl  in  order  to
        match individual bytes in UTF-8 mode, but it is unclear how it can use-
-       fully  be  used.  Because  \C breaks up characters into individual data
-       units, matching one unit with \C in a UTF mode means that the  rest  of
+       fully be used. Because \C breaks up  characters  into  individual  data
+       units,  matching  one unit with \C in a UTF mode means that the rest of
        the string may start with a malformed UTF character. This has undefined
        results, because PCRE assumes that it is dealing with valid UTF strings
-       (and  by  default  it checks this at the start of processing unless the
-       PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK or  PCRE_NO_UTF32_CHECK  option
+       (and by default it checks this at the start of  processing  unless  the
+       PCRE_NO_UTF8_CHECK,  PCRE_NO_UTF16_CHECK  or PCRE_NO_UTF32_CHECK option
        is used).
 
-       PCRE  does  not  allow \C to appear in lookbehind assertions (described
-       below) in a UTF mode, because this would make it impossible  to  calcu-
+       PCRE does not allow \C to appear in  lookbehind  assertions  (described
+       below)  in  a UTF mode, because this would make it impossible to calcu-
        late the length of the lookbehind.
 
        In general, the \C escape sequence is best avoided. However, one way of
-       using it that avoids the problem of malformed UTF characters is to  use
-       a  lookahead to check the length of the next character, as in this pat-
-       tern, which could be used with a UTF-8 string (ignore white  space  and
+       using  it that avoids the problem of malformed UTF characters is to use
+       a lookahead to check the length of the next character, as in this  pat-
+       tern,  which  could be used with a UTF-8 string (ignore white space and
        line breaks):
 
          (?| (?=[\x00-\x7f])(\C) |
@@ -5724,11 +5737,11 @@ MATCHING A SINGLE DATA UNIT
              (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
              (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))
 
-       A  group  that starts with (?| resets the capturing parentheses numbers
-       in each alternative (see "Duplicate  Subpattern  Numbers"  below).  The
-       assertions  at  the start of each branch check the next UTF-8 character
-       for values whose encoding uses 1, 2, 3, or 4 bytes,  respectively.  The
-       character's  individual bytes are then captured by the appropriate num-
+       A group that starts with (?| resets the capturing  parentheses  numbers
+       in  each  alternative  (see  "Duplicate Subpattern Numbers" below). The
+       assertions at the start of each branch check the next  UTF-8  character
+       for  values  whose encoding uses 1, 2, 3, or 4 bytes, respectively. The
+       character's individual bytes are then captured by the appropriate  num-
        ber of groups.
 
 
@@ -5738,109 +5751,109 @@ SQUARE BRACKETS AND CHARACTER CLASSES
        closing square bracket. A closing square bracket on its own is not spe-
        cial by default.  However, if the PCRE_JAVASCRIPT_COMPAT option is set,
        a lone closing square bracket causes a compile-time error. If a closing
-       square bracket is required as a member of the class, it should  be  the
-       first  data  character  in  the  class (after an initial circumflex, if
+       square  bracket  is required as a member of the class, it should be the
+       first data character in the class  (after  an  initial  circumflex,  if
        present) or escaped with a backslash.
 
-       A character class matches a single character in the subject. In  a  UTF
-       mode,  the  character  may  be  more than one data unit long. A matched
+       A  character  class matches a single character in the subject. In a UTF
+       mode, the character may be more than one  data  unit  long.  A  matched
        character must be in the set of characters defined by the class, unless
-       the  first  character in the class definition is a circumflex, in which
+       the first character in the class definition is a circumflex,  in  which
        case the subject character must not be in the set defined by the class.
-       If  a  circumflex is actually required as a member of the class, ensure
+       If a circumflex is actually required as a member of the  class,  ensure
        it is not the first character, or escape it with a backslash.
 
-       For example, the character class [aeiou] matches any lower case  vowel,
-       while  [^aeiou]  matches  any character that is not a lower case vowel.
+       For  example, the character class [aeiou] matches any lower case vowel,
+       while [^aeiou] matches any character that is not a  lower  case  vowel.
        Note that a circumflex is just a convenient notation for specifying the
-       characters  that  are in the class by enumerating those that are not. A
-       class that starts with a circumflex is not an assertion; it still  con-
-       sumes  a  character  from the subject string, and therefore it fails if
+       characters that are in the class by enumerating those that are  not.  A
+       class  that starts with a circumflex is not an assertion; it still con-
+       sumes a character from the subject string, and therefore  it  fails  if
        the current pointer is at the end of the string.
 
        In UTF-8 (UTF-16, UTF-32) mode, characters with values greater than 255
-       (0xffff)  can be included in a class as a literal string of data units,
+       (0xffff) can be included in a class as a literal string of data  units,
        or by using the \x{ escaping mechanism.
 
-       When caseless matching is set, any letters in a  class  represent  both
-       their  upper  case  and lower case versions, so for example, a caseless
-       [aeiou] matches "A" as well as "a", and a caseless  [^aeiou]  does  not
-       match  "A", whereas a caseful version would. In a UTF mode, PCRE always
-       understands the concept of case for characters whose  values  are  less
-       than  128, so caseless matching is always possible. For characters with
-       higher values, the concept of case is supported  if  PCRE  is  compiled
-       with  Unicode  property support, but not otherwise.  If you want to use
-       caseless matching in a UTF mode for characters 128 and above, you  must
-       ensure  that  PCRE is compiled with Unicode property support as well as
+       When  caseless  matching  is set, any letters in a class represent both
+       their upper case and lower case versions, so for  example,  a  caseless
+       [aeiou]  matches  "A"  as well as "a", and a caseless [^aeiou] does not
+       match "A", whereas a caseful version would. In a UTF mode, PCRE  always
+       understands  the  concept  of case for characters whose values are less
+       than 128, so caseless matching is always possible. For characters  with
+       higher  values,  the  concept  of case is supported if PCRE is compiled
+       with Unicode property support, but not otherwise.  If you want  to  use
+       caseless  matching in a UTF mode for characters 128 and above, you must
+       ensure that PCRE is compiled with Unicode property support as  well  as
        with UTF support.
 
-       Characters that might indicate line breaks are  never  treated  in  any
-       special  way  when  matching  character  classes,  whatever line-ending
-       sequence is in  use,  and  whatever  setting  of  the  PCRE_DOTALL  and
+       Characters  that  might  indicate  line breaks are never treated in any
+       special way  when  matching  character  classes,  whatever  line-ending
+       sequence  is  in  use,  and  whatever  setting  of  the PCRE_DOTALL and
        PCRE_MULTILINE options is used. A class such as [^a] always matches one
        of these characters.
 
-       The minus (hyphen) character can be used to specify a range of  charac-
-       ters  in  a  character  class.  For  example,  [d-m] matches any letter
-       between d and m, inclusive. If a  minus  character  is  required  in  a
-       class,  it  must  be  escaped  with a backslash or appear in a position
-       where it cannot be interpreted as indicating a range, typically as  the
+       The  minus (hyphen) character can be used to specify a range of charac-
+       ters in a character  class.  For  example,  [d-m]  matches  any  letter
+       between  d  and  m,  inclusive.  If  a minus character is required in a
+       class, it must be escaped with a backslash  or  appear  in  a  position
+       where  it cannot be interpreted as indicating a range, typically as the
        first or last character in the class, or immediately after a range. For
-       example, [b-d-z] matches letters in the range b to d, a hyphen  charac-
+       example,  [b-d-z] matches letters in the range b to d, a hyphen charac-
        ter, or z.
 
        It is not possible to have the literal character "]" as the end charac-
-       ter of a range. A pattern such as [W-]46] is interpreted as a class  of
-       two  characters ("W" and "-") followed by a literal string "46]", so it
-       would match "W46]" or "-46]". However, if the "]"  is  escaped  with  a
-       backslash  it is interpreted as the end of range, so [W-\]46] is inter-
-       preted as a class containing a range followed by two other  characters.
-       The  octal or hexadecimal representation of "]" can also be used to end
+       ter  of a range. A pattern such as [W-]46] is interpreted as a class of
+       two characters ("W" and "-") followed by a literal string "46]", so  it
+       would  match  "W46]"  or  "-46]". However, if the "]" is escaped with a
+       backslash it is interpreted as the end of range, so [W-\]46] is  inter-
+       preted  as a class containing a range followed by two other characters.
+       The octal or hexadecimal representation of "]" can also be used to  end
        a range.
 
-       An error is generated if a POSIX character  class  (see  below)  or  an
-       escape  sequence other than one that defines a single character appears
-       at a point where a range ending character  is  expected.  For  example,
+       An  error  is  generated  if  a POSIX character class (see below) or an
+       escape sequence other than one that defines a single character  appears
+       at  a  point  where  a range ending character is expected. For example,
        [z-\xff] is valid, but [A-\d] and [A-[:digit:]] are not.
 
-       Ranges  operate in the collating sequence of character values. They can
-       also  be  used  for  characters  specified  numerically,  for   example
-       [\000-\037].  Ranges  can include any characters that are valid for the
+       Ranges operate in the collating sequence of character values. They  can
+       also   be  used  for  characters  specified  numerically,  for  example
+       [\000-\037]. Ranges can include any characters that are valid  for  the
        current mode.
 
        If a range that includes letters is used when caseless matching is set,
        it matches the letters in either case. For example, [W-c] is equivalent
-       to [][\\^_`wxyzabc], matched caselessly, and  in  a  non-UTF  mode,  if
-       character  tables  for  a French locale are in use, [\xc8-\xcb] matches
-       accented E characters in both cases. In UTF modes,  PCRE  supports  the
-       concept  of  case for characters with values greater than 128 only when
+       to  [][\\^_`wxyzabc],  matched  caselessly,  and  in a non-UTF mode, if
+       character tables for a French locale are in  use,  [\xc8-\xcb]  matches
+       accented  E  characters  in both cases. In UTF modes, PCRE supports the
+       concept of case for characters with values greater than 128  only  when
        it is compiled with Unicode property support.
 
-       The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,  \V,
+       The  character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, \V,
        \w, and \W may appear in a character class, and add the characters that
-       they match to the class. For example, [\dABCDEF] matches any  hexadeci-
-       mal  digit.  In  UTF modes, the PCRE_UCP option affects the meanings of
-       \d, \s, \w and their upper case partners, just as  it  does  when  they
-       appear  outside a character class, as described in the section entitled
+       they  match to the class. For example, [\dABCDEF] matches any hexadeci-
+       mal digit. In UTF modes, the PCRE_UCP option affects  the  meanings  of
+       \d,  \s,  \w  and  their upper case partners, just as it does when they
+       appear outside a character class, as described in the section  entitled
        "Generic character types" above. The escape sequence \b has a different
-       meaning  inside  a character class; it matches the backspace character.
-       The sequences \B, \N, \R, and \X are not  special  inside  a  character
-       class.  Like  any other unrecognized escape sequences, they are treated
-       as the literal characters "B", "N", "R", and "X" by default, but  cause
+       meaning inside a character class; it matches the  backspace  character.
+       The  sequences  \B,  \N,  \R, and \X are not special inside a character
+       class. Like any other unrecognized escape sequences, they  are  treated
+       as  the literal characters "B", "N", "R", and "X" by default, but cause
        an error if the PCRE_EXTRA option is set.
 
-       A  circumflex  can  conveniently  be used with the upper case character
-       types to specify a more restricted set of characters than the  matching
-       lower  case  type.  For example, the class [^\W_] matches any letter or
+       A circumflex can conveniently be used with  the  upper  case  character
+       types  to specify a more restricted set of characters than the matching
+       lower case type.  For example, the class [^\W_] matches any  letter  or
        digit, but not underscore, whereas [\w] includes underscore. A positive
        character class should be read as "something OR something OR ..." and a
        negative class as "NOT something AND NOT something AND NOT ...".
 
-       The only metacharacters that are recognized in  character  classes  are
-       backslash,  hyphen  (only  where  it can be interpreted as specifying a
-       range), circumflex (only at the start), opening  square  bracket  (only
-       when  it can be interpreted as introducing a POSIX class name, or for a
-       special compatibility feature - see the next  two  sections),  and  the
+       The  only  metacharacters  that are recognized in character classes are
+       backslash, hyphen (only where it can be  interpreted  as  specifying  a
+       range),  circumflex  (only  at the start), opening square bracket (only
+       when it can be interpreted as introducing a POSIX class name, or for  a
+       special  compatibility  feature  -  see the next two sections), and the
        terminating  closing  square  bracket.  However,  escaping  other  non-
        alphanumeric characters does no harm.
 
@@ -5848,7 +5861,7 @@ SQUARE BRACKETS AND CHARACTER CLASSES
 POSIX CHARACTER CLASSES
 
        Perl supports the POSIX notation for character classes. This uses names
-       enclosed  by  [: and :] within the enclosing square brackets. PCRE also
+       enclosed by [: and :] within the enclosing square brackets.  PCRE  also
        supports this notation. For example,
 
          [01[:alpha:]%]
@@ -5871,28 +5884,28 @@ POSIX CHARACTER CLASSES
          word     "word" characters (same as \w)
          xdigit   hexadecimal digits
 
-       The  default  "space" characters are HT (9), LF (10), VT (11), FF (12),
-       CR (13), and space (32). If locale-specific matching is  taking  place,
-       the  list  of  space characters may be different; there may be fewer or
+       The default "space" characters are HT (9), LF (10), VT (11),  FF  (12),
+       CR  (13),  and space (32). If locale-specific matching is taking place,
+       the list of space characters may be different; there may  be  fewer  or
        more of them. "Space" used to be different to \s, which did not include
        VT, for Perl compatibility.  However, Perl changed at release 5.18, and
-       PCRE followed at release 8.34.  "Space" and \s now match the  same  set
+       PCRE  followed  at release 8.34.  "Space" and \s now match the same set
        of characters.
 
-       The  name  "word"  is  a Perl extension, and "blank" is a GNU extension
-       from Perl 5.8. Another Perl extension is negation, which  is  indicated
+       The name "word" is a Perl extension, and "blank"  is  a  GNU  extension
+       from  Perl  5.8. Another Perl extension is negation, which is indicated
        by a ^ character after the colon. For example,
 
          [12[:^digit:]]
 
-       matches  "1", "2", or any non-digit. PCRE (and Perl) also recognize the
+       matches "1", "2", or any non-digit. PCRE (and Perl) also recognize  the
        POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
        these are not supported, and an error is given if they are encountered.
 
        By default, characters with values greater than 128 do not match any of
-       the POSIX character classes. However, if the PCRE_UCP option is  passed
-       to  pcre_compile(),  some  of  the  classes are changed so that Unicode
-       character properties are used. This is achieved  by  replacing  certain
+       the  POSIX character classes. However, if the PCRE_UCP option is passed
+       to pcre_compile(), some of the classes  are  changed  so  that  Unicode
+       character  properties  are  used. This is achieved by replacing certain
        POSIX classes by other sequences, as follows:
 
          [:alnum:]  becomes  \p{Xan}
@@ -5904,10 +5917,10 @@ POSIX CHARACTER CLASSES
          [:upper:]  becomes  \p{Lu}
          [:word:]   becomes  \p{Xwd}
 
-       Negated  versions, such as [:^alpha:] use \P instead of \p. Three other
+       Negated versions, such as [:^alpha:] use \P instead of \p. Three  other
        POSIX classes are handled specially in UCP mode:
 
-       [:graph:] This matches characters that have glyphs that mark  the  page
+       [:graph:] This  matches  characters that have glyphs that mark the page
                  when printed. In Unicode property terms, it matches all char-
                  acters with the L, M, N, P, S, or Cf properties, except for:
 
@@ -5916,58 +5929,58 @@ POSIX CHARACTER CLASSES
                    U+2066 - U+2069  Various "isolate"s
 
 
-       [:print:] This matches the same  characters  as  [:graph:]  plus  space
-                 characters  that  are  not controls, that is, characters with
+       [:print:] This  matches  the  same  characters  as [:graph:] plus space
+                 characters that are not controls, that  is,  characters  with
                  the Zs property.
 
        [:punct:] This matches all characters that have the Unicode P (punctua-
-                 tion)  property,  plus those characters whose code points are
+                 tion) property, plus those characters whose code  points  are
                  less than 128 that have the S (Symbol) property.
 
-       The other POSIX classes are unchanged, and match only  characters  with
+       The  other  POSIX classes are unchanged, and match only characters with
        code points less than 128.
 
 
 COMPATIBILITY FEATURE FOR WORD BOUNDARIES
 
-       In  the POSIX.2 compliant library that was included in 4.4BSD Unix, the
-       ugly syntax [[:<:]] and [[:>:]] is used for matching  "start  of  word"
+       In the POSIX.2 compliant library that was included in 4.4BSD Unix,  the
+       ugly  syntax  [[:<:]]  and [[:>:]] is used for matching "start of word"
        and "end of word". PCRE treats these items as follows:
 
          [[:<:]]  is converted to  \b(?=\w)
          [[:>:]]  is converted to  \b(?<=\w)
 
        Only these exact character sequences are recognized. A sequence such as
-       [a[:<:]b] provokes error for an unrecognized  POSIX  class  name.  This
-       support  is not compatible with Perl. It is provided to help migrations
+       [a[:<:]b]  provokes  error  for  an unrecognized POSIX class name. This
+       support is not compatible with Perl. It is provided to help  migrations
        from other environments, and is best not used in any new patterns. Note
-       that  \b matches at the start and the end of a word (see "Simple asser-
-       tions" above), and in a Perl-style pattern the preceding  or  following
-       character  normally  shows  which  is  wanted, without the need for the
-       assertions that are used above in order to give exactly the  POSIX  be-
+       that \b matches at the start and the end of a word (see "Simple  asser-
+       tions"  above),  and in a Perl-style pattern the preceding or following
+       character normally shows which is wanted,  without  the  need  for  the
+       assertions  that  are used above in order to give exactly the POSIX be-
        haviour.
 
 
 VERTICAL BAR
 
-       Vertical  bar characters are used to separate alternative patterns. For
+       Vertical bar characters are used to separate alternative patterns.  For
        example, the pattern
 
          gilbert|sullivan
 
-       matches either "gilbert" or "sullivan". Any number of alternatives  may
-       appear,  and  an  empty  alternative  is  permitted (matching the empty
+       matches  either "gilbert" or "sullivan". Any number of alternatives may
+       appear, and an empty  alternative  is  permitted  (matching  the  empty
        string). The matching process tries each alternative in turn, from left
-       to  right, and the first one that succeeds is used. If the alternatives
-       are within a subpattern (defined below), "succeeds" means matching  the
+       to right, and the first one that succeeds is used. If the  alternatives
+       are  within a subpattern (defined below), "succeeds" means matching the
        rest of the main pattern as well as the alternative in the subpattern.
 
 
 INTERNAL OPTION SETTING
 
-       The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
-       PCRE_EXTENDED options (which are Perl-compatible) can be  changed  from
-       within  the  pattern  by  a  sequence  of  Perl option letters enclosed
+       The settings of the  PCRE_CASELESS,  PCRE_MULTILINE,  PCRE_DOTALL,  and
+       PCRE_EXTENDED  options  (which are Perl-compatible) can be changed from
+       within the pattern by  a  sequence  of  Perl  option  letters  enclosed
        between "(?" and ")".  The option letters are
 
          i  for PCRE_CASELESS
@@ -5977,51 +5990,51 @@ INTERNAL OPTION SETTING
 
        For example, (?im) sets caseless, multiline matching. It is also possi-
        ble to unset these options by preceding the letter with a hyphen, and a
-       combined setting and unsetting such as (?im-sx), which sets  PCRE_CASE-
-       LESS  and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
-       is also permitted. If a  letter  appears  both  before  and  after  the
+       combined  setting and unsetting such as (?im-sx), which sets PCRE_CASE-
+       LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and  PCRE_EXTENDED,
+       is  also  permitted.  If  a  letter  appears  both before and after the
        hyphen, the option is unset.
 
-       The  PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA
-       can be changed in the same way as the Perl-compatible options by  using
+       The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and  PCRE_EXTRA
+       can  be changed in the same way as the Perl-compatible options by using
        the characters J, U and X respectively.
 
-       When  one  of  these  option  changes occurs at top level (that is, not
-       inside subpattern parentheses), the change applies to the remainder  of
+       When one of these option changes occurs at  top  level  (that  is,  not
+       inside  subpattern parentheses), the change applies to the remainder of
        the pattern that follows. If the change is placed right at the start of
        a pattern, PCRE extracts it into the global options (and it will there-
        fore show up in data extracted by the pcre_fullinfo() function).
 
-       An  option  change  within a subpattern (see below for a description of
-       subpatterns) affects only that part of the subpattern that follows  it,
+       An option change within a subpattern (see below for  a  description  of
+       subpatterns)  affects only that part of the subpattern that follows it,
        so
 
          (a(?i)b)c
 
        matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
-       used).  By this means, options can be made to have  different  settings
-       in  different parts of the pattern. Any changes made in one alternative
-       do carry on into subsequent branches within the  same  subpattern.  For
+       used).   By  this means, options can be made to have different settings
+       in different parts of the pattern. Any changes made in one  alternative
+       do  carry  on  into subsequent branches within the same subpattern. For
        example,
 
          (a(?i)b|c)
 
-       matches  "ab",  "aB",  "c",  and "C", even though when matching "C" the
-       first branch is abandoned before the option setting.  This  is  because
-       the  effects  of option settings happen at compile time. There would be
+       matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the
+       first  branch  is  abandoned before the option setting. This is because
+       the effects of option settings happen at compile time. There  would  be
        some very weird behaviour otherwise.
 
-       Note: There are other PCRE-specific options that  can  be  set  by  the
-       application  when  the  compiling  or matching functions are called. In
-       some cases the pattern can contain special leading  sequences  such  as
-       (*CRLF)  to  override  what  the  application  has set or what has been
-       defaulted.  Details  are  given  in  the  section   entitled   "Newline
-       sequences"  above.  There  are also the (*UTF8), (*UTF16),(*UTF32), and
-       (*UCP) leading sequences that can be used to set UTF and Unicode  prop-
-       erty  modes;  they are equivalent to setting the PCRE_UTF8, PCRE_UTF16,
-       PCRE_UTF32 and the PCRE_UCP options, respectively. The (*UTF)  sequence
-       is  a  generic version that can be used with any of the libraries. How-
-       ever, the application can set the PCRE_NEVER_UTF  option,  which  locks
+       Note:  There  are  other  PCRE-specific  options that can be set by the
+       application when the compiling or matching  functions  are  called.  In
+       some  cases  the  pattern can contain special leading sequences such as
+       (*CRLF) to override what the application  has  set  or  what  has  been
+       defaulted.   Details   are  given  in  the  section  entitled  "Newline
+       sequences" above. There are also the  (*UTF8),  (*UTF16),(*UTF32),  and
+       (*UCP)  leading sequences that can be used to set UTF and Unicode prop-
+       erty modes; they are equivalent to setting the  PCRE_UTF8,  PCRE_UTF16,
+       PCRE_UTF32  and the PCRE_UCP options, respectively. The (*UTF) sequence
+       is a generic version that can be used with any of the  libraries.  How-
+       ever,  the  application  can set the PCRE_NEVER_UTF option, which locks
        out the use of the (*UTF) sequences.
 
 
@@ -6034,18 +6047,18 @@ SUBPATTERNS
 
          cat(aract|erpillar|)
 
-       matches "cataract", "caterpillar", or "cat". Without  the  parentheses,
+       matches  "cataract",  "caterpillar", or "cat". Without the parentheses,
        it would match "cataract", "erpillar" or an empty string.
 
-       2.  It  sets  up  the  subpattern as a capturing subpattern. This means
-       that, when the whole pattern  matches,  that  portion  of  the  subject
+       2. It sets up the subpattern as  a  capturing  subpattern.  This  means
+       that,  when  the  whole  pattern  matches,  that portion of the subject
        string that matched the subpattern is passed back to the caller via the
-       ovector argument of the matching function. (This applies  only  to  the
-       traditional  matching functions; the DFA matching functions do not sup-
+       ovector  argument  of  the matching function. (This applies only to the
+       traditional matching functions; the DFA matching functions do not  sup-
        port capturing.)
 
        Opening parentheses are counted from left to right (starting from 1) to
-       obtain  numbers  for  the  capturing  subpatterns.  For example, if the
+       obtain numbers for the  capturing  subpatterns.  For  example,  if  the
        string "the red king" is matched against the pattern
 
          the ((red|white) (king|queen))
@@ -6053,12 +6066,12 @@ SUBPATTERNS
        the captured substrings are "red king", "red", and "king", and are num-
        bered 1, 2, and 3, respectively.
 
-       The  fact  that  plain  parentheses  fulfil two functions is not always
-       helpful.  There are often times when a grouping subpattern is  required
-       without  a capturing requirement. If an opening parenthesis is followed
-       by a question mark and a colon, the subpattern does not do any  captur-
-       ing,  and  is  not  counted when computing the number of any subsequent
-       capturing subpatterns. For example, if the string "the white queen"  is
+       The fact that plain parentheses fulfil  two  functions  is  not  always
+       helpful.   There are often times when a grouping subpattern is required
+       without a capturing requirement. If an opening parenthesis is  followed
+       by  a question mark and a colon, the subpattern does not do any captur-
+       ing, and is not counted when computing the  number  of  any  subsequent
+       capturing  subpatterns. For example, if the string "the white queen" is
        matched against the pattern
 
          the ((?:red|white) (king|queen))
@@ -6066,37 +6079,37 @@ SUBPATTERNS
        the captured substrings are "white queen" and "queen", and are numbered
        1 and 2. The maximum number of capturing subpatterns is 65535.
 
-       As a convenient shorthand, if any option settings are required  at  the
-       start  of  a  non-capturing  subpattern,  the option letters may appear
+       As  a  convenient shorthand, if any option settings are required at the
+       start of a non-capturing subpattern,  the  option  letters  may  appear
        between the "?" and the ":". Thus the two patterns
 
          (?i:saturday|sunday)
          (?:(?i)saturday|sunday)
 
        match exactly the same set of strings. Because alternative branches are
-       tried  from  left  to right, and options are not reset until the end of
-       the subpattern is reached, an option setting in one branch does  affect
-       subsequent  branches,  so  the above patterns match "SUNDAY" as well as
+       tried from left to right, and options are not reset until  the  end  of
+       the  subpattern is reached, an option setting in one branch does affect
+       subsequent branches, so the above patterns match "SUNDAY"  as  well  as
        "Saturday".
 
 
 DUPLICATE SUBPATTERN NUMBERS
 
        Perl 5.10 introduced a feature whereby each alternative in a subpattern
-       uses  the same numbers for its capturing parentheses. Such a subpattern
-       starts with (?| and is itself a non-capturing subpattern. For  example,
+       uses the same numbers for its capturing parentheses. Such a  subpattern
+       starts  with (?| and is itself a non-capturing subpattern. For example,
        consider this pattern:
 
          (?|(Sat)ur|(Sun))day
 
-       Because  the two alternatives are inside a (?| group, both sets of cap-
-       turing parentheses are numbered one. Thus, when  the  pattern  matches,
-       you  can  look  at captured substring number one, whichever alternative
-       matched. This construct is useful when you want to  capture  part,  but
+       Because the two alternatives are inside a (?| group, both sets of  cap-
+       turing  parentheses  are  numbered one. Thus, when the pattern matches,
+       you can look at captured substring number  one,  whichever  alternative
+       matched.  This  construct  is useful when you want to capture part, but
        not all, of one of a number of alternatives. Inside a (?| group, paren-
-       theses are numbered as usual, but the number is reset at the  start  of
-       each  branch.  The numbers of any capturing parentheses that follow the
-       subpattern start after the highest number used in any branch. The  fol-

[... 2131 lines stripped ...]



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to