RE: Best use of language dep. analyzers?

George Aroush Mon, 02 Apr 2007 17:39:50 -0700

Hi Torsten,

Are you referring to the analyzer in Snowball.Net?  I ported those analyzer
to C# however, since I lack the language understanding, and those analyzers
don't come with a JUnit to port and test in the C# land, I can't confirm if
the port is valid or not.  This is the case for 1.9 as well as for 2.0, I'm
afraid it will remain the case unless if someone with langue knowledge
debugged them.

-- George Aroush

-----Original Message-----
From: Torsten Rendelmann [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 31, 2007 11:52 AM
To: [email protected]
Subject: Best use of language dep. analyzers?

Hi, I'm not so familiar with the lucene (Java) direction of dev. in the
field of language dependent analyzers. What will it be?

We use a slightly modified version of 1.9 lucene.net (wich include the yet
published/converted language dep. analyzers - various folders below
"Analysis" named "BR", "CJK", "FR", "DE" etc.). As far I understand they
should be used to analyze language specific documents/texts and get rid of
stop words, etc. - so provide the "real" text to index. So currently we
detect/get the language out of the documents we index, transform them to
create the "right" analyzer and add the document.
But they are not stable, we got various problems using them (endless loops,
empty string in a stop word table just to name some).

Will this be the same for lucene.net 2.x ? What "language" package will be
available?
Will it be part of the apache project?

Thx,
Torsten Rendelmann

RE: Best use of language dep. analyzers?

Reply via email to