On Mon, Nov 30, 2009 at 11:00 AM, erik quanstrom <[email protected]> wrote: >> ``unfold turns a character, say ë into the set of >> characters that can be folded to the same base >> character. so >> ; unfold ë >> [eèéêëēĕėęěȅȇȩḕḗḙḛḝẹẻẽếềểễệ]'' >> >> To me, that sounds like [e-f] should be >> >> [eèéêëēĕėęěȅȇȩḕḗḙḛḝẹẻẽếềểễệfƒ] >> >> iff e unfolds to the same set as ë. If e only unfolds to [e], then >> [e-f] would unfold to [ef]. > > i don't think that works. consider [e-g]. normally > this would match 'f', but under your algorithm it wouldn't. > the problem is that [a-z] works because ascii is arranged > in alphabetical order. all the various accented characters > are not.
It would work if the algorithm didn't expand the class just by enumerating ASCII letters, but for every letter also added the accented chars. > > that's why the folding approach has an advantage [a-z] > will work and will do the Right Thing. > > - erik > >
