Renaud Delbru wrote:

Hi Michael,

Michael McCandless wrote:
Also, this issue was just opened:


  https://issues.apache.org/jira/browse/LUCENE-1419

which would make it possible for classes in the same package (oal.index) to use their own indexing chain. With that fix, if you make your own classes in oal.index package, and perhaps subclass the above classes, you could then create your own indexing chain for indexing? If you take that approach, please report back so we can learn how to improve Lucene for these very advanced customizations!

As a first impression, what will be handy in order to customize postings list will be to make an abstract class FreqProxTermsWriter, that separates segment creation and term information serialisation. This class will implement the generic logic for flushing and appending postings, but will delegate to subclasses the way you write doc + freq and prox + payload info.

A first idea will be to have the following abstract methods:
- writeMinState : called by appendPostings, and define how to serialise one FreqProxFieldMergeState - writeDocFreq : called by writeMinState, and define how to serialise docs and freq - writeProx: called by writeMinState and define how to serialise positions and payloads

I think other parts of the FreqProxTermsWriter can stay generic. What do you think ?

I agree: let's decouple the "codec" (how to write terms/freq/prox) from the other mechanics in FreqProxTermsWriter.

I don't think FreqProxFieldMergeState should be visible to that codec, though. That class is used, internally to FreqProxTermsWriter, to manage the multiple threads that had accumulated postings data.

I think the codec API could look something like this:

  newField(...)
    startTerm(...)
      startDocument(...)
        addPosition(...)
      endDocument(...)
    endTerm(...)

We would then make a codec that matches today's index file format, but allow for others (you) to swap in a new codec. All of this would be experimental & private to oal.index for starters.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to