> What if the query is '+strong force' and 'strong' is tokenized into
> 'strong' and the alias 'tough' ?  Will the query parser convert it to
> '+(strong OR tough) force' ?

If strong is tokenized to /strong tough/, then /+strong force/ will
be tokenized to /+"strong tough" force/.  

While this may look weird to you, that's because this is a really
bogus example.  Tokenizers in western languages generally don't expand
a single token into more than one; this is more common in asian
languages, where a complicated ideograph is broken down into simpler
ones.  So a more comparable example in English would be to tokenize
the words "HaagenDazs" into "ice cream".  So then a search for
/+HaagenDazs chocolate/ becomes (+"ice cream" chocolate), which is
a pretty reasonable-looking behavior.  

> Isn't the double quote notation "term term" indicates a phrase ? 

Yes

> If so, translating '+strong will' to +("strong tough") (a phrase)
> does not seem right since you now require both 'strong' and 'tough'
> to appear in indexed document and in that order. I think the
> geenrated multi terms should have an OR relation.

Thats only because the tokenization example you gave is not
representative of what an analyzer would do in expanding one term into
more than one token.

Also, don't forget that the source text will have been processed by
the same analyzer.  So even if you do tokenize "jack" into "yummy
cheese", and the query analyzer turns /+jack beanstalk/ into 
/+"yummy cheese" beanstalk/, it will still match documents that
had 'jack' and 'beanstalk' in their input text.  


> > > I think this is clear from the syntax.
> >
> > Only if you know how to read syntax specifications.  Not everyone who
> > is going to use the query parser does.  Remember, the query parser is
> > aimed at users who don't know what the word 'syntax' means.
> 
> The FAQ is aimed at programmers that embed Lucene in there application and
> not
> at their end users.

Who will still have to create documentation for _their_ (naive) users.
Why not call attention to the issues that will undoubtedly bite their
users?  Why not give them something they can basically cut and paste
from?

_______________________________________________
Lucene-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-dev

Reply via email to