26 apr 2006 kl. 19.18 skrev Doug Cutting:

karl wettin wrote:
How about refactoring fields to something like:
[Document](fieldName)<#>---- {0..1} ->[Field +boost]<#>---- {0..*} -> [FieldValue +store +index +termVector]

If you think you have a simple, back-compatible way to do this, please submit a patch. Perhaps it is simpler than I imagined.

Long-term, an API which supports per token boosting will probably prove useful, as a part of #11 on http:// wiki.apache.org/jakarta- lucene/Lucene2Whiteboard.
I've wanted that feature a few times. Let me know if there is something I can do to help when the time is right.

The time will be right as soon as someone decides they want to implement this! Ideally every part of the index would be pluggable, but the most important is postings, so probably we should start there.

My idea is that the logic of DocumentWriter

I would prefer to leave out the persistence and deprication from the discussion until the rest is solved, as I spend all my spare brain time on the InstanciatedIndex-thingy.

and also probably a no-positions version, a no-freqs version and a weight-per-position version. TermFreqs and TermPositions should be replaced with a generic Postings API. Applications can then downcast the Postings instance based on the FieldInfo.

This is much more interesting from my point of view. Let's start here.

I might be wrong and I really don't know why it is so bad, but I think casting based on FieldInfo would be breaking the Liskov subtituion principle in big way.

My own immediate thought is to compromise by allowing boost per term in document. Simply remove the norms-methods from the IndexReader and add a new one to the TermEnum and fall back on the field boost. How would the value be picked up by the scorer?

Boost per position, et.c. sounds very expensive.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to