26 apr 2006 kl. 19.18 skrev Doug Cutting:
karl wettin wrote:
How about refactoring fields to something like:
[Document](fieldName)<#>---- {0..1} ->[Field +boost]<#>----
{0..*} -> [FieldValue +store +index +termVector]
If you think you have a simple, back-compatible way to do this,
please submit a patch. Perhaps it is simpler than I imagined.
Long-term, an API which supports per token boosting will
probably prove useful, as a part of #11 on http://
wiki.apache.org/jakarta- lucene/Lucene2Whiteboard.
I've wanted that feature a few times. Let me know if there is
something I can do to help when the time is right.
The time will be right as soon as someone decides they want to
implement this! Ideally every part of the index would be
pluggable, but the most important is postings, so probably we
should start there.
My idea is that the logic of DocumentWriter
I would prefer to leave out the persistence and deprication from the
discussion until the rest is solved, as I spend all my spare brain
time on the InstanciatedIndex-thingy.
and also probably a no-positions version, a no-freqs version and a
weight-per-position version. TermFreqs and TermPositions should be
replaced with a generic Postings API. Applications can then
downcast the Postings instance based on the FieldInfo.
This is much more interesting from my point of view. Let's start here.
I might be wrong and I really don't know why it is so bad, but I
think casting based on FieldInfo would be breaking the Liskov
subtituion principle in big way.
My own immediate thought is to compromise by allowing boost per term
in document. Simply remove the norms-methods from the IndexReader and
add a new one to the TermEnum and fall back on the field boost. How
would the value be picked up by the scorer?
Boost per position, et.c. sounds very expensive.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]