Hello,
This was a thread on lucene-user initially, but I'm copying lucene-dev
as well. Sorry about duplicates.
--- Stefan Bergstrand <[EMAIL PROTECTED]> wrote:
> Doug Cutting <[EMAIL PROTECTED]> writes:
>
> Just noticed this problem in my program.
>
> It seems as if the analyzer passed to QueryParser.parse(), never is
> passed to PrefixQuery (which is what my test case is parsed to).
>
> A quick look in QueryParser.jj confirms this:
>
> q = new PrefixQuery(new Term(field, term.image.substring
> (0, term.image.length()-1)));
I thought that queries such as 'rou?d' are considered wildcard queries
by QueryParser.jj, and not Prefix queries, no?
In the default definition of token in QueryParser.jj I see this:
| <PREFIXTERM: <_TERM_START_CHAR> (<_TERM_CHAR>)* "*" >
| <WILDTERM: <_TERM_START_CHAR>
(<_TERM_CHAR> | ( [ "*", "?" ] ))* >
Then further down in QueryParser.jj we have this:
if (wildcard)
q = new WildcardQuery(new Term(field, term.image));
So a WildWuery is being constructed, not PrefixQuery, I think.
What I don't understand is why the definition of _TERM_START_CHAR looks
like this:
| <#_TERM_START_CHAR: ~[ " ", "\t", "+", "-", "!", "(", ")", ":", "^",
"[", "]", "\"", "{", "}", "~", "*" ] >
Maybe the name is misleading, but it seems like _TERM_START_CHAR are
the characters that a TERM can start with, because later in
QueryParser.jj we have TERM defined as:
| <TERM: <_TERM_START_CHAR> (<_TERM_CHAR>)* >
and _TERM_CHAR has this definition:
| <#_TERM_CHAR: <_TERM_START_CHAR> >
So how can we have a "*" in _TERM_START_CHAR when terms are not allowed
to start with a "*", and if we do have "*", how come we do not have "?"
as well?
Can somebodyt correct me in every place where I made false statements,
assumptions, and conclusions?
Thanks,
Otis
> > > From: Howk, Michael [mailto:[EMAIL PROTECTED]]
> > >
> > > Also, Lucene returns the parsed version of each of our
> > > searches. When we
> > > search by rou*d, Lucene parses it as rou*d (which is what we
> > > would expect).
> > > But when we search by rou?d, Lucene parses it as "rou d". It
> > > seems to wrap
> > > the term in quotes and replace the question mark with a
> > > space. Any ideas? Or
> > > can someone give us an idea of how to understand WildcardQuery or
> > > WildcardTermEnum?
> >
> > It sounds like the problem is in the query parser. Brian?
> >
> > Doug
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> >
> >
>
> --
> ---------------------------
> Stefan Bergstrand
> Polopoly - Cultivating the information garden
> Ph: +46 8 506 782 67
> Cell: +46 704 47 82 67
> Fax: +46 8 506 782 51
> [EMAIL PROTECTED], http://www.polopoly.com
>
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>
__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>