Renaud Delbru wrote:
Hi Michael,
Michael McCandless wrote:
Also, this issue was just opened:
https://issues.apache.org/jira/browse/LUCENE-1419
which would make it possible for classes in the same package
(oal.index) to use their own indexing chain. With that fix, if you
make your own classes in oal.index package, and perhaps subclass
the above classes, you could then create your own indexing chain
for indexing? If you take that approach, please report back so we
can learn how to improve Lucene for these very advanced
customizations!
As a first impression, what will be handy in order to customize
postings list will be to make an abstract class FreqProxTermsWriter,
that separates segment creation and term information serialisation.
This class will implement the generic logic for flushing and
appending postings, but will delegate to subclasses the way you
write doc + freq and prox + payload info.
A first idea will be to have the following abstract methods:
- writeMinState : called by appendPostings, and define how to
serialise one FreqProxFieldMergeState
- writeDocFreq : called by writeMinState, and define how to
serialise docs and freq
- writeProx: called by writeMinState and define how to serialise
positions and payloads
I think other parts of the FreqProxTermsWriter can stay generic.
What do you think ?
I agree: let's decouple the "codec" (how to write terms/freq/prox)
from the other mechanics in FreqProxTermsWriter.
I don't think FreqProxFieldMergeState should be visible to that codec,
though. That class is used, internally to FreqProxTermsWriter, to
manage the multiple threads that had accumulated postings data.
I think the codec API could look something like this:
newField(...)
startTerm(...)
startDocument(...)
addPosition(...)
endDocument(...)
endTerm(...)
We would then make a codec that matches today's index file format, but
allow for others (you) to swap in a new codec. All of this would be
experimental & private to oal.index for starters.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]