27 apr 2007 kl. 14.11 skrev Erik Hatcher:


On Apr 27, 2007, at 6:39 AM, karl wettin wrote:
27 apr 2007 kl. 12.36 skrev Erik Hatcher:

Unless someone has some other tricks I'm not aware of, that is.

I guess it would be possible to add start/stop-tokens such as ^ and $ to the indexed text: "^ the $" and place a phrase query with 0 slop.

True true.   That'd work too.

I was thinking about this today. And I'm still thinking, don't take this too serious. I just want to see if I can implement this a less hacky way.

Number of terms in the field is what is missing in order to implement a Query the "correct way", right? Clone the norms-code. SpanCompleteFieldQuery would extend SpanNearQuery, have slop 0 and require [tokens in field] clauses. To me this is more compelling than the ^$ hack. However, if there are no other features one can think of this information will yeild, the hack might just turn out to be better.

I can't think of anything I'd call a feature:

Norms could be calculated in a higher resolution instead of beeing stored as a float. What is most expensive, to convert the byte to float or divide a bunch at query time?

Rebuilding term vectors using skipTo() might save some by not seeking more than nessecary.

Match only terms in fields that are between n and m tokens long. However, this might be better of discretized in a few bins, or perhaps even possible to estimated based on the (existing implementation) norm value?


What else is there?


--

karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to