Stefan Groschupf wrote:
Did I miss something in general to be able to support non required
terms in nutch?
I left OR and nesting out of the API to simplify what query filters have
to process. Nutch's query features are approximately what Google
supported for its first three years. (Google did not add OR until 2000,
I think.)
If we permit optional clauses then we need to make sure that each query
filter can handle them correctly.
For example, the query "+A +B" is translated by query-basic into
something like:
+(title:a OR content:a OR anchors:a OR url:a OR host:a)
+(title:b OR content:b OR anchors:b OR url:b OR host:b)
title:"a b"~999
content:"a b"~999
anchors:"a b"~999
url:"a b"~999
host:"a b"~999
The query "+A B" (where B is optional) should remove the plus in the
second line above. So it should not be too hard to change query-basic
to be able to handle optional terms in the default field. Perhaps
that's the only query filter that would need to be updated. And it
looks like LuceneQueryOptimizer already checks that filterized clauses
are required.
It would be good to have some unit tests for query filtering.
Doug