Thanks for the hint.
I would love to add non required terms and nesting to the Query
object API, I will provide also some unit tests, but since I'm not a
javacc geek it will only extend the java api not the query parser.
Would such a extension be welcome?
Stefan
Am 12.01.2006 um 18:29 schrieb Doug Cutting:
Stefan Groschupf wrote:
Did I miss something in general to be able to support non
required terms in nutch?
I left OR and nesting out of the API to simplify what query filters
have to process. Nutch's query features are approximately what
Google supported for its first three years. (Google did not add OR
until 2000, I think.)
If we permit optional clauses then we need to make sure that each
query filter can handle them correctly.
For example, the query "+A +B" is translated by query-basic into
something like:
+(title:a OR content:a OR anchors:a OR url:a OR host:a)
+(title:b OR content:b OR anchors:b OR url:b OR host:b)
title:"a b"~999
content:"a b"~999
anchors:"a b"~999
url:"a b"~999
host:"a b"~999
The query "+A B" (where B is optional) should remove the plus in
the second line above. So it should not be too hard to change
query-basic to be able to handle optional terms in the default
field. Perhaps that's the only query filter that would need to be
updated. And it looks like LuceneQueryOptimizer already checks
that filterized clauses are required.
It would be good to have some unit tests for query filtering.
Doug
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net