: What I'm trying to do is prevent Lucene from providing better ranking
: for documents that use a term multiple times than those that have more
: term hits.
:
: I've got some huge queries with quite a number of unique terms.  I
: want the documents that hit more unique terms to float to the top,
: while documents that hit some or few of the terms to sink to the
: bottom (even if they have more occurrences of those terms).
:
: Lucene, as I understand things, does this for the most part, though it
: is possible that term frequency can play a significant roll and drown
: out the part of the desired behavior that I'd like to keep.

your best two choices for tweaking this behavior are to make term
frequency less significant, or make the coord factor for boolean queries
more significant.

: I guess what I'm asking is, is freq, the value passed to tf(), the
: count of the term, or a ratio of the term to total terms in the index.

for term queries it is the literal term frequency (you can see this by
looking at the Explaination info for a query)

: "This factor does not affect document ranking (since all ranked
: documents are multiplied by the same factor), but rather just attempts
: to make scores from different queries (or even different indexes)
: comparable."
:
: This to me is black magic.  It alludes that one can do two different
: queries and a merge-sort, and further that the content can come from
: different indexes.

it is in fact, black magic ... as the phrase says it *attempts* to make
scores from different queries comparable ... it does not actually make
them mathematicly comparable, since scores are completley unbounded.

As i recall, a more practical purpose for the queryNorm is that when
dealing with large complex query structures consisting of "container"
queries (BooleanQueries, DisjunctionMaxQueries, SpanNearQueries, etc...)
the queryNorm is applied to the the "leaf"  queries as the computation
proceeds, which helps keep the scores from getting unmanagably large
(and loosing precision) as they are aggregated up.

when dealing with floats, where 0<n<1 ...
  A*n + B*n + C*n + ... Z*n
...results in  more "precise" calculation then...
  (A + B C + ... + Z)*n

...correct?




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to