George, Yes Snowball was in my mind as I wrote my post. My understanding of that was it does provide a general way to analyze, not providing one analyzer for each language. I'm wrong?
If I only would have enough spare time to have a look, I would like to help with that (porting our current code using per language analyzers and track down issues). Torsten > -----Original Message----- > From: George Aroush [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 03, 2007 2:40 AM > To: [email protected] > Subject: RE: Best use of language dep. analyzers? > > Hi Torsten, > > Are you referring to the analyzer in Snowball.Net? I ported > those analyzer > to C# however, since I lack the language understanding, and > those analyzers > don't come with a JUnit to port and test in the C# land, I > can't confirm if > the port is valid or not. This is the case for 1.9 as well > as for 2.0, I'm > afraid it will remain the case unless if someone with langue knowledge > debugged them. > > -- George Aroush > > -----Original Message----- > From: Torsten Rendelmann [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 31, 2007 11:52 AM > To: [email protected] > Subject: Best use of language dep. analyzers? > > Hi, I'm not so familiar with the lucene (Java) direction of > dev. in the > field of language dependent analyzers. What will it be? > > We use a slightly modified version of 1.9 lucene.net (wich > include the yet > published/converted language dep. analyzers - various folders below > "Analysis" named "BR", "CJK", "FR", "DE" etc.). As far I > understand they > should be used to analyze language specific documents/texts > and get rid of > stop words, etc. - so provide the "real" text to index. So > currently we > detect/get the language out of the documents we index, > transform them to > create the "right" analyzer and add the document. > But they are not stable, we got various problems using them > (endless loops, > empty string in a stop word table just to name some). > > Will this be the same for lucene.net 2.x ? What "language" > package will be > available? > Will it be part of the apache project? > > Thx, > Torsten Rendelmann >
