George,

Yes Snowball was in my mind as I wrote my post.
My understanding of that was it does provide a general way
to analyze, not providing one analyzer for each language.
I'm wrong?

If I only would have enough spare time to have a look,
I would like to help with that (porting our current code using
per language analyzers and track down issues).

Torsten

> -----Original Message-----
> From: George Aroush [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, April 03, 2007 2:40 AM
> To: [email protected]
> Subject: RE: Best use of language dep. analyzers?
> 
> Hi Torsten,
> 
> Are you referring to the analyzer in Snowball.Net?  I ported 
> those analyzer
> to C# however, since I lack the language understanding, and 
> those analyzers
> don't come with a JUnit to port and test in the C# land, I 
> can't confirm if
> the port is valid or not.  This is the case for 1.9 as well 
> as for 2.0, I'm
> afraid it will remain the case unless if someone with langue knowledge
> debugged them.
> 
> -- George Aroush
> 
> -----Original Message-----
> From: Torsten Rendelmann [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 31, 2007 11:52 AM
> To: [email protected]
> Subject: Best use of language dep. analyzers?
> 
> Hi, I'm not so familiar with the lucene (Java) direction of 
> dev. in the
> field of language dependent analyzers. What will it be?
>  
> We use a slightly modified version of 1.9 lucene.net (wich 
> include the yet
> published/converted language dep. analyzers - various folders below
> "Analysis" named "BR", "CJK", "FR", "DE" etc.). As far I 
> understand they
> should be used to analyze language specific documents/texts 
> and get rid of
> stop words, etc. - so provide the "real" text to index. So 
> currently we
> detect/get the language out of the documents we index, 
> transform them to
> create the "right" analyzer and add the document.
> But they are not stable, we got various problems using them 
> (endless loops,
> empty string in a stop word table just to name some).
>  
> Will this be the same for lucene.net 2.x ? What "language" 
> package will be
> available?
> Will it be part of the apache project?
>  
> Thx,
> Torsten Rendelmann
>  


Reply via email to