Hi Torsten, Are you referring to the analyzer in Snowball.Net? I ported those analyzer to C# however, since I lack the language understanding, and those analyzers don't come with a JUnit to port and test in the C# land, I can't confirm if the port is valid or not. This is the case for 1.9 as well as for 2.0, I'm afraid it will remain the case unless if someone with langue knowledge debugged them.
-- George Aroush -----Original Message----- From: Torsten Rendelmann [mailto:[EMAIL PROTECTED] Sent: Saturday, March 31, 2007 11:52 AM To: [email protected] Subject: Best use of language dep. analyzers? Hi, I'm not so familiar with the lucene (Java) direction of dev. in the field of language dependent analyzers. What will it be? We use a slightly modified version of 1.9 lucene.net (wich include the yet published/converted language dep. analyzers - various folders below "Analysis" named "BR", "CJK", "FR", "DE" etc.). As far I understand they should be used to analyze language specific documents/texts and get rid of stop words, etc. - so provide the "real" text to index. So currently we detect/get the language out of the documents we index, transform them to create the "right" analyzer and add the document. But they are not stable, we got various problems using them (endless loops, empty string in a stop word table just to name some). Will this be the same for lucene.net 2.x ? What "language" package will be available? Will it be part of the apache project? Thx, Torsten Rendelmann
