Marvin Humphrey wrote: > I'm curious: are there any cases in French where a string with an > apostrophe in it ought to be split into two searchable tokens? I > know of no such cases in English: you never want to search for the ll > in you'll, or the O in O'Reilly, etc.
First of all, add maybe I make a false assumption here, but if you strip leading "j'", "t'" and so on, that means that if you make a search like: +text:"il m'aime" you will get documents with the sentence "il m'aime" (french for "he loves me") and document with the sentence "il t'aime" (french for "he loves you"), which is wrong, right ? So if this is correct, this is why I need to index both "m" and "aime" as distinct tokens. And I guess this is why "O'Reilly" is not splitted by the StandardAnalyzer, since you don't want to find the documents containing "N'Reilly". For a more general purpose, I am a native french speaker, but I'm not sure there are some cases where a string with an apostrophe has to be split into two (real) searchable tokens. I know the word "aujourd'hui" (french for "today"), but it's likely a complete word by itself which does not need to be splitted again. If this is important to you, I could look further, and ask some french linguists help. -- Hugo --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]