Best use of language dep. analyzers?

Torsten Rendelmann Sat, 31 Mar 2007 07:52:36 -0800

Hi, I'm not so familiar with the lucene (Java) direction of dev. in the
field of language
dependent analyzers. What will it be?
 
We use a slightly modified version of 1.9 lucene.net (wich include the
yet
published/converted language dep. analyzers - various folders below
"Analysis" named
"BR", "CJK", "FR", "DE" etc.). As far I understand they should be used
to analyze
language specific documents/texts and get rid of stop words, etc. - so
provide the
"real" text to index. So currently we detect/get the language out of the
documents
we index, transform them to create the "right" analyzer and add the
document.
But they are not stable, we got various problems using them (endless
loops, empty
string in a stop word table just to name some).
 
Will this be the same for lucene.net 2.x ? What "language" package will
be available?
Will it be part of the apache project?
 
Thx,
Torsten Rendelmann

Best use of language dep. analyzers?

Reply via email to