Erik Hatcher wrote: > Rather than changing StandardAnalyzer, you could create a custom > Analyzer that is something along the lines of StandardTokenizer -> > custom apostrophe splitting filter -> ISOLatinFilter.
Why do not include that in the FrenchStemFilter "next()" method itself ? It will be a bad design ? And I'm quite concerned with performance issue, but it seem's to me that your solution will only affect "APOSTROPHE" typed token, so the overhead will be unexistant, right ? > You get a special type for words with interior apostrophes from > StandardTokenizer (look at StandardFilter to see how that works). You > could create a simple TokenFilter that splits apostrophe'd tokens > into two. I'm not sure to figure out to do that efficiently. Is it something like that ? : <code> private Stack subTokens; //previously initialized public final Token next() throws IOException { Token t = null; if (subTokens != null && !subTokens.empty) { t = subTokens.pop(); } else { t = input.next(); if (t != null) { String type = t.type(); if (type == APOSTROPHE_TYPE) { tokenizeApostrophe(t, subTokens); } } } return t; } </code> with "tokenizeApostrophe(Token, Stack)" that split on conditions the token into 2 others, and push them on the stack. > Maybe it's simple enough also to expand "j" and "l" into "je" and > "le" in the same step too? It will be simple, but I'm not sure yet I want to expand them back. Maybe it will be useful to index the "j" token after all. Anyway thanks for your quick answer, -- Hugo --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]