Hello,
On Jun 1 12:02 Jim Meyering wrote (excerpt):
... the way grep's -i is implemented: it converts both the RE and the buffer-to-search to lower case, and then performs the search.
I wonder if "convert ... to lower case" is really a correct implementation for caseless matching because in http://www.unicode.org/versions/Unicode6.1.0/ch05.pdf I found that "case folding ... is more than just conversion to lowercase": -------------------------------------------------------------------- Implementation Guidelines ... 5.18 Case Mappings ... Complications for Case Mapping ... Context-dependent Case Mappings. Characters may have different case mappings, depending on the context surrounding the character in the original string. For example, U+03A3 [greek capital letter sigma] lowercases to U+03C3 [greek small letter sigma] if it is followed by another letter, but lowercases to U+03C2 [greek small letter final sigma] if it is not. ... Caseless Matching Caseless matching is implemented using case folding, which is the process of mapping characters of different case to a single form, so that case differences in strings are erased. Case folding allows for fast caseless matches in lookups because only binary comparison is required. It is more than just conversion to lowercase. For example, it correctly handles cases such as the Greek sigma... -------------------------------------------------------------------- Is grep's -i implemented via plain convert to lower case or is it actually implemented via "case folding"? FYI: http://www.unicode.org/versions/Unicode6.1.0/ch05.pdf describes in particular the "Turkish I" issue in detail... Kind Regards Johannes Meixner -- SUSE LINUX Products GmbH -- Maxfeldstrasse 5 -- 90409 Nuernberg -- Germany HRB 16746 (AG Nuernberg) GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer
