On Thursday 20 March 2003 11:12, Leander Harding wrote:
...
> looks at the terms they contain and highlights them all. Consider the
> following query:
> ("foo" AND "bar") OR "baz"
> Suppose that we search using this query and the following document is a
> hit: <doc>Foo.....quux......baz.</doc>
> Which Terms do we highlight?
> All of the existing highlighting code I've seen would highlight both "foo"
> and "baz", but this isn't correct - the document contains "foo", but no
> "bar", thus, since "foo" in the query is part of an AND expression that
> wasn't satisfied by this document, only "baz" should be highlighted.
> So my questions three, are thus:
> What's the best way to go about this?
> Has anyone been working on anything similar?
> Is there already API to make this possible that I'm overlooking?
I think that some of proposed/planned changes would make implementing this
bit easier (see mailing list archives for discussion). However, there is
slight difficulty in "reverse engineering" 'and' and 'or' relationships
from query itself (backtracking from Query object trying to see how
required/prohibited/optional terms form ANDed/ORed groups).
None of proposed solutions would easily give you that grouping information I
think.
Another similar problem is matching phrase hits; they too can not be simply
highlighted using just a set of all existing individual terms.
What you probably end up doing is re-building query tree and evaluating
branches, pruning ones that do not result in hit, then using that (optimized)
tree for highlighting.
Then again, that (evaluation part) is already done by search functionality...
so perhaps you could reuse parts?
-+ Tatu +-
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]