karl wettin wrote:
karl wettin wrote:
This could lead me to believe I can use different boost for fields
with the same name within one document.
You can. The values are multiplied to produce the final boost value
for the field.
It's not really the same thing as I tried to describe though.
No, it's not, you're right.
How about refactoring fields to something like:
[Document](fieldName)<#>---- {0..1} ->[Field +boost]<#>---- {0..*}
-> [FieldValue +store +index +termVector]
That would be a big, incompatible change to one of Lucene's primary
APIs, no?
Not if I got it right in my head. Then it's really just a matter of
handling deprication. The field-methods in Document could be the same.
If you think you have a simple, back-compatible way to do this, please
submit a patch. Perhaps it is simpler than I imagined.
Long-term, an API which supports per token boosting will probably
prove useful, as a part of #11 on http://wiki.apache.org/jakarta-
lucene/Lucene2Whiteboard.
I've wanted that feature a few times. Let me know if there is something
I can do to help when the time is right.
The time will be right as soon as someone decides they want to implement
this! Ideally every part of the index would be pluggable, but the most
important is postings, so probably we should start there.
My idea is that the logic of DocumentWriter.invertDocument() remain much
the same, and that DocumentWriter.addPosition() is replaced with a
method on a pluggable class. So invertDocument() would keep a
FieldIndexer for each field and call a method like addPosition() for
each token found. (We might add a boost field to Token that's passed
into this method.) Then, at the end, invertDocument() would flush all
of the FieldIndexers(). SegmentMerger would need to be changed
similarly. Implementing FieldIndexers that can sensibly share output
files may be tricky. We should implement FieldIndexers that are
back-compatible with the existing index format, and also probably a
no-positions version, a no-freqs version and a weight-per-position
version. TermFreqs and TermPositions should be replaced with a generic
Postings API. Applications can then downcast the Postings instance
based on the FieldInfo.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]