On Mon, Nov 30, 2009 at 11:00 AM, erik quanstrom <[email protected]> wrote:
>> ``unfold turns a character, say ë into the set of
>> characters that can be folded to the same base
>> character.  so
>>        ; unfold ë
>>        [eèéêëēĕėęěȅȇȩḕḗḙḛḝẹẻẽếềểễệ]''
>>
>> To me, that sounds like [e-f] should be
>>
>> [eèéêëēĕėęěȅȇȩḕḗḙḛḝẹẻẽếềểễệfƒ]
>>
>> iff e unfolds to the same set as ë. If e only unfolds to [e], then
>> [e-f] would unfold to [ef].
>
> i don't think that works.  consider [e-g].  normally
> this would match 'f', but under your algorithm it wouldn't.
> the problem is that [a-z] works because ascii is arranged
> in alphabetical order.  all the various accented characters
> are not.

It would work if the algorithm didn't expand the class just by
enumerating ASCII letters, but
for every letter also added the accented chars.

>
> that's why the folding approach has an advantage [a-z]
> will work and will do the Right Thing.
>
> - erik
>
>

Reply via email to