27 apr 2007 kl. 14.11 skrev Erik Hatcher:
On Apr 27, 2007, at 6:39 AM, karl wettin wrote:
27 apr 2007 kl. 12.36 skrev Erik Hatcher:
Unless someone has some other tricks I'm not aware of, that is.
I guess it would be possible to add start/stop-tokens such as ^
and $ to the indexed text: "^ the $" and place a phrase query with
0 slop.
True true. That'd work too.
I was thinking about this today. And I'm still thinking, don't take
this too serious. I just want to see if I can implement this a less
hacky way.
Number of terms in the field is what is missing in order to implement
a Query the "correct way", right? Clone the norms-code.
SpanCompleteFieldQuery would extend SpanNearQuery, have slop 0 and
require [tokens in field] clauses. To me this is more compelling than
the ^$ hack. However, if there are no other features one can think of
this information will yeild, the hack might just turn out to be better.
I can't think of anything I'd call a feature:
Norms could be calculated in a higher resolution instead of beeing
stored as a float. What is most expensive, to convert the byte to
float or divide a bunch at query time?
Rebuilding term vectors using skipTo() might save some by not seeking
more than nessecary.
Match only terms in fields that are between n and m tokens long.
However, this might be better of discretized in a few bins, or
perhaps even possible to estimated based on the (existing
implementation) norm value?
What else is there?
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]