On Jan 6, 2009, at 7:26 PM, 이지홍 wrote:

thanks for your answers.
i'm sorry. my english writing is not good.
i was told you. the Lucene SandBox Analyzer.
you can find out.
following url :
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/
go there. you cand find out GermanAnalyzer and FranchAnalyzer.

I will ask you repeat Time.

Lucene SandBox Analyzer is What Diffrent From  SnowBallAnalyzer?

I would suggest looking at the code. I haven't ever investigated them at a low-level. If I had to guess, I bet they just have different approaches to how stemming is done. Chances are neither is right or wrong and there is no such thing as a perfect stemmer.

If I were you, I would setup a small program that takes in some number of Strings from your documents in each of the languages and then runs them through each Analyzer, printing out the the tokens. I have a _SAMPLE_ of this in my Lucene Boot Camp training code: http://www.lucenebootcamp.com/LuceneBootCamp/training/src/test/java/com/lucenebootcamp/training/basic/AnalyzerTest.java




I don't know That.

Which One Is Best?

Best for what?  It's going to depend.




you can sure that snowball analyzer is covered english language?


Yes.

Analyzer analyzer = new SnowballAnalyzer("English");



Plz Teach me.


Please have a look through more of the documentation and try some things out.

A simple:
 Analyzer analyzer = new //FILL IN YOUR ANALYZER HERE
TokenStream stream = analyzer.tokenStream("foo", new StringReader("Test String Goes here"));
Token token = new Token();
    while ((token = tokenStream.next(token)) != null) {
      System.out.println("Token: " + token);
    }

will go a long way in your understanding of how these Analyzers work.


I am doing Lucene Boot Camp at ApacheCon in Amsterdam, Netherlands in March. If you can't make that, I suggest you buy the most excellent "Lucene In Action" by Erik, Otis and Mike M. (http://www.manning.com/hatcher3 ). Otherwise, there are plenty of tutorials and articles on using Lucene at http://wiki.apache.org/lucene-java/Resources and on the Wiki itself: http://wiki.apache.org/lucene-java/ which will cover how to use an analyzer.

You might also check out Solr's Admin UI, which has a built in way of outputting tokens to the screen given some user input in a text box. See the Solr project for more on that.

Good Luck,
Grant

Reply via email to