May be I should show some examples where I think custom configuration can be useful. Let me give you two examples:
1) As of now, KStem does conflation of both words "connector" and "connected" to the same term "connect". 2) Contrary it does not do conflation of "transaction" and "transactions" to the same term. Having an option to modify internal lexicons I would be able to adapt the KStem to work better for specific text corpora. What do you think? Regards, Lukas On Mon, Jun 20, 2011 at 12:55 PM, Lukáš Vlček <lukas.vl...@gmail.com> wrote: > Hi, > > Is there any API in KStem filter for lexicons configuration? > > As far as I understand the original code works in such a way that lexicons > are loaded from files at startup (see > http://lexicalresearch.com/kstem-doc.txt). The author (Robert Krovetz) > names possibility to modify lexicons among advantages of KStem compared to > other stemmers. > > Do people not need it? Would it be a useful addition for KStem filter to > allow custom lexicon configurations in its API? > > Regards, > Lukas > > Note: Big kudos to all who participated in bringing KStem into Lucene! >