karl wettin wrote:
karl wettin wrote:

This could lead me to believe I can use different boost for fields with the same name within one document.

You can. The values are multiplied to produce the final boost value for the field.

It's not really the same thing as I tried to describe though.

No, it's not, you're right.

How about refactoring fields to something like:
[Document](fieldName)<#>---- {0..1} ->[Field +boost]<#>---- {0..*} -> [FieldValue +store +index +termVector]


That would be a big, incompatible change to one of Lucene's primary APIs, no?

Not if I got it right in my head. Then it's really just a matter of handling deprication. The field-methods in Document could be the same.

If you think you have a simple, back-compatible way to do this, please submit a patch. Perhaps it is simpler than I imagined.

Long-term, an API which supports per token boosting will probably prove useful, as a part of #11 on http://wiki.apache.org/jakarta- lucene/Lucene2Whiteboard.

I've wanted that feature a few times. Let me know if there is something I can do to help when the time is right.

The time will be right as soon as someone decides they want to implement this! Ideally every part of the index would be pluggable, but the most important is postings, so probably we should start there.

My idea is that the logic of DocumentWriter.invertDocument() remain much the same, and that DocumentWriter.addPosition() is replaced with a method on a pluggable class. So invertDocument() would keep a FieldIndexer for each field and call a method like addPosition() for each token found. (We might add a boost field to Token that's passed into this method.) Then, at the end, invertDocument() would flush all of the FieldIndexers(). SegmentMerger would need to be changed similarly. Implementing FieldIndexers that can sensibly share output files may be tricky. We should implement FieldIndexers that are back-compatible with the existing index format, and also probably a no-positions version, a no-freqs version and a weight-per-position version. TermFreqs and TermPositions should be replaced with a generic Postings API. Applications can then downcast the Postings instance based on the FieldInfo.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to