Re: One problem of using the lucene

jason Tue, 17 Jan 2006 03:00:04 -0800

hi,

thx for your replies.


I have test the snowballFilter and it does not stem the term "support". It
means the term "support" should be in all the papers. However, i add the
synonymFilter, the "support" is missing.

I think i have to read the lucene source code again.

yours truly

Jiang Xing

On 1/17/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
>
> On Jan 17, 2006, at 12:14 AM, jason wrote:
> > It is adding tokens into the same position as the original token.
> > And then,
> > I used the QueryParser for searching and the snowball analyzer for
> > parsing.
>
> Ok, so you're only using the SynonymAnalyzer for indexing, and the
> SnowballAnalyzer for QueryParser, correct?  If so, that is reasonable.
>
> >     public TokenStream tokenStream(String fieldName, Reader reader){
> >
> >         TokenStream result = new StandardTokenizer(reader);
> >         result = new StandardFilter(result);
> >         result = new LowerCaseFilter(result);
> >         if (stopword != null){
> >           result = new StopFilter(result, stopword);
> >         }
> >
> >         result = new SnowballFilter(result, "Lovins");
> >
> >         result = new SynonymFilter(result, engine);
> >
> >         return result;
> >     }
> >
> > }
> > I write some code in the snowballfitler (line 75-79). If i only
> > used the
> > snowballfilter, the term "support" can be found in all the 17
> > documents.
> > However, if the code "result = new SynonymFilter(result, engine);"
> > is used.
> > The term "support" cannot be found in some documents.
>
>
> It looks like you borrowed SynonymAnalyzer from the Lucene in Action
> code.  But you've tweaked some things.  One thing that is clearly
> amiss is that you're looking up synonyms for stemmed words, which is
> not going to work (unless you stemmed the WordNet words beforehand,
> but I doubt you did that and it would quite odd to do so).  You're
> probably not injecting many synonyms at all.
>
> I encourage you to "analyze your analyzer" by running some utilities
> such as the Analyzer demo that comes with Lucene in Action's code.
> You'll have some more insight into this issue when trying this out in
> isolation from query parsing and other complexities.
>
> >   /** Returns the next input Token, after being stemmed */
> >   public final Token next() throws IOException {
> >     Token token = input.next();
> >     if (token == null)
> >       return null;
> >     stemmer.setCurrent(token.termText());
> >     try {
> >       stemMethod.invoke(stemmer, EMPTY_ARGS);
> >     } catch (Exception e) {
> >       throw new RuntimeException(e.toString());
> >     }
> >
> >     Token newToken = new Token(stemmer.getCurrent(),
> >                       token.startOffset(), token.endOffset(),
> > token.type());
> >     //check the tokens.
> >     if(newToken.termText().equals("support")){
> >         System.out.println("the term support is found");
> >     }
>
> I'm not sure what the exact solution to your dilemma is, but doing
> more testing with your analyzer will likely shed light on it for you.
>
>        Erik
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: One problem of using the lucene

Reply via email to