>First I have to say, Brian Goetz has done an awesome job putting this 
>together.

A great way to get my attention, thanks!  :)

>Overview of how I think the QueryParser works:
>The basis for the QueryParser is to break up everything into the 
>appropriate type of Query (TermQuery, PhraseQuery, ...), by the matching 
>the query pattern. Then to combine these queries into a collection of 
>BooleanClauses and finally BooleanQueries.

That's the theory.  Unfortunately, we cheat a little bit.

>Now the NEAR option will only work with phrase searches since that is the 
>only place where you can set the slop factor.

Right.

>So I had two thoughts.
>1) Create a new pattern <TERM> "NEAR"<NUM>+ <TERM>
>The problem I have with this, is that I don't think it will work since the 
>parser will see <TERM> and never look for the "NEAR" option.

That's one of the real problems with writing two-level parsers with tools 
like JavaCC -- the tokenizer is completly separate from the parser.  With 
the current parser, it recognizes the tokens AND and OR and NOT, only in 
uppercase, because (a) most users type in lower case and/or mixed case, and 
(b) these are likely to be stop words (or should be) anyway.  But NEAR is 
more dangerous, since its a useful word, but the upper-case rule might be 
enough.

>2) Retroactively make a TermQuery a PhraseQuery with a set slop.

This is a better strategy.

I had been planning on overhauling this myself, but basically that got 
stalled because I didn't want to take it on until we had a clear statement 
of what the goal and functionality of the query parser should be and were 
looking for a syntax for all the various features that made sense.



--
Brian Goetz
Quiotix Corporation
[EMAIL PROTECTED]           Tel: 650-843-1300            Fax: 650-324-8032

http://www.quiotix.com


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to