On Mon, Jun 20, 2011 at 8:23 AM, Lukáš Vlček <lukas.vl...@gmail.com> wrote: > Hi Robert, > this sounds interesting I will look at it in more detail. > However, I do not think this is really a general solution. If I understand > StemmerOverrideFilter correctly (from a quick glance) it rely on the fact > that you *know* exact term (the key in the map) in advance. In other words > if I wanted to "fix" some term produced by Kstem filter I would have to know > what is the product of the stemming in advance. Now, this means that if I > switch to snowball or porter or other stemmer instead of KStem or simply > update something else in the filtering chain then I am in trouble. Also if I > understand correctly the original KStem implementation it can still get > updates to lexicons which means that once these updates are ported to Java > implementation it can again result in problem with existing override filter > setup. > More generally, is there any reason why lexicons are not configurable in
Because we have StemmerOverrideFilter and KeywordMarkerFilter. look at the source code to Kstem: it uses maps and sets of exceptions, this is what these filters provide in a general way (StemmerOverrideFilter being the map, and KeywordMarkerFilter being the set). we added these to work across the board with all lucene stemmers for this reason. I don't understand your concerns at all to be honest, they make no sense to me. If we "updated" kstem or any other algorithm: it would break whatever you are doing either way. A hashmap is a hashmap. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org