Greets,

What would the consequences be of eliminating Similarity.queryNorm()? I cargo-culted that method when porting, but now I'm going through and trying to refactor for simplicity's sake. If I can zap it, I'd like to.

First, the theoretical angle:

According to the Similarity docs, queryNorm() doesn't impact document ranking, since it is applied as a multiplier to the scores of all matching docs. I don't see how it's all that useful, then.

How important is it that discrete queries, or queries against different indexes, produce scores within a "comparable" range? It seems to me that if you need that, you can always perform normalization after the search completes by setting the top score to 1.0 and increasing/decreasing other scores proportionately. Are there any cases where that solution wouldn't be adequate?

Second, the implementation angle:

Is it really true that document ranking is unaffected by queryNorm()? It seems to me that when multi-level boolean queries are normalized, clauses having different IDFs would end up with different multipliers. Maybe I'm wrong -- it's always hard to wrap your head around recursion -- but is the assertion that ranking is unaffected a documentation glitch?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to