[EMAIL PROTECTED] wrote: > from the awk man page: > > Collating Symbols [...] > > Equivalence Classes [...] > > but this isn't explained in the grep man page nor the info grep > page on the website: http://www.gnu.org/software/grep/doc/
Thanks for noting this. See file #5824, https://savannah.gnu.org/patch/download.php?file_id=5824, from the Consolidated documentation patch at https://savannah.gnu.org/patch/?func=detailitem&item_id=4610 . Then apply the attached patch. If you can improve on it, please do so. Benno
--- grepplus/doc/grep.1.old 2006-05-23 23:12:09.000000000 +0200 +++ grepplus/doc/grep.1 2006-06-10 14:44:05.000000000 +0200 @@ -649,7 +649,7 @@ environment variable to the value .BR C . .PP -Finally, certain named classes of characters are predefined within +Certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are .BR [:alnum:] , @@ -684,6 +684,57 @@ Finally, to include a literal .B \- place it last. +.PP +Two additional special sequences can appear in character lists. +These are +.B "collating symbols" +and +.BR "equivalence classes" . +Both apply to +non-\s-1ASCII\s+1 character sets. Such character sets can have single symbols +(called +.IR "collating elements" ) +that are represented with more than one character, +as well as several characters that are equivalent for sorting, or +.IR collating , +purposes. +.PP +A +.B "collating symbol" +is a multi-character collating element enclosed in +.B [. +and +.BR .] . +For example, if +.B ch +is a collating element, then +.B [[.ch.]] +is a regular expression that matches this collating element, while +.B [ch] +is a regular expression that matches either +.B c +or +.BR h . +.PP +An +.B "equivalence class" +is a locale-specific name for a list of characters +that are equivalent. The name is enclosed in +.B [= +and +.BR =] . +For example, the name +.B e +might be used to represent all of +\*(lqe,\*(rq \*(lqe\h'-\w:e:u'\',\*(rq and \*(lqe\h'-\w:e:u'\`.\*(rq +In this case, +.B [[=e=]] +is a regular expression +that matches any of +.BR e , +.BR "e\h'-\w:e:u'\'" , +or +.BR "e\h'-\w:e:u'\`" . .SS Anchoring The caret .B ^
