Ok, I've been looking at getting the QueryParser to work under this new
world order and I'm having trouble understanding where to hook into it.  As
far a I can see, the QueryParser creates a single Analyzer that is used to
tokenize the Query String.  How, then, would you vary the tokenization
properties on a per-field bases?  I don't see where I would hook into the
tokenizer to tell it to pass the appropriate information to the Analyzer at
the appropriate time.  I'm not even to newbie level with JavaCC, so maybe
that's the problem, but I'd appreciate any pointers.

Thanks,
Scott

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 19, 2001 5:47 PM
To: 'Scott Ganyo'; Lucene-Dev (E-mail)
Subject: RE: [Lucene-dev] Allowing an Analyzer to choose a parsing
strateg y based on contex t


> From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
> 
> I've made the following simple and backward-compatible 
> changes to a couple
> of classes in order to allow an Analyzer to choose a parsing 
> strategy based
> on Document and/or Field:
> 
> I changed DocumentWriter.java, line 123 from:
> TokenStream stream = analyzer.tokenStream(reader);
> 
> To:
> TokenStream stream = analyzer.tokenStream(doc, field, reader);
> 
> 
> ...and I changed Analyzer.java implementation to add 
> tokenStream(Document, Field, Reader) method:

I've thought a bit more about this.  The new method should also be usable by
the query parser, right?  But the query parser doesn't have a Document or a
Field.  So I think the the new method should instead be:

  public TokenStream tokenStream(String fieldName, Reader text);

That way the query parser can, after having parsed out field names, apply
the appropriate analysis to the tokens.

A utility Analyzer class like the following would also be useful:

  public class FieldAnalyzers extends Analyzer {
    private HashTable fieldToAnalyzer = new HashTable();
    public void add(String fieldName, Analyzer analyzer) {
      fieldToAnalyzer.put(field, analyzer);
    }
    public TokenStream tokenStream(String field, Reader reader) {
      return ((Analyzer)fieldToAnalyzer.get(field)).tokenStream(field,
reader); 
    }
  }

Probably needs a little more error checking, and maybe a default analyzer,
but you get the idea...

Doug

_______________________________________________
Lucene-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-dev

Reply via email to