22 maj 2013 kl. 20:29 skrev Petite Abeille:
>
> On May 22, 2013, at 7:08 PM, Karl Wettin <[email protected]> wrote:
>
>>> * Use a filter after ASCIIFoldingFilter that discriminate all use of ae,
>>> oe, oo, and other combination of double vowels, just keeping the first one.
>>
>> I ended up with that solution.
>>
>> https://issues.apache.org/jira/browse/LUCENE-5013
>
> Interesting problem… perhaps you could generalize your solution a bit… for
> example, in, say, German, one could substitute 'ue' for 'ü', etc… so it looks
> like what you are after is folding double vowels… irrespectively of how they
> got there…
>
> So… assuming something along the lines of Sean M. Burke Unidecode [1] for the
> purpose of ASCII transliteration, what's left is simply to fold double
> vowels, e.g.:
I pasted your reply as a comment in the JIRA-issue.
Hmmm interesting thought though. I have to consider if it make sense to make it
this generic. I think it might be problematic for some languages though,
especially Dutch.
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]