Paul Elschot wrote:
I had another look at SegmentTermDocs.skipTo() and at SegmentTermPositions, and I think I'm beginning to get your point. Could it be doable per skipInterval docs?
Almost ... but not quite, except maybe for the first segment being merged. The problem is, the new skip data will not in general be "aligned" to the old skip data, except for the first segment. EG the skipInterval is 16; say for term "foo" the first segment has 18 docs and the 2nd segment has 22 docs. We could bulk-copy that first chunk of 16 docs from the first segment, but then we write another 2 docs and then 14 docs into the 2nd segment we need to write new skip data, so we cannot bulk copy the 2nd segment since then we won't know the byte offset at that 14 doc point. I guess we could entertain allowing skip intervals to not be "regular", such that at the boundaries of previously merged segments it's allowed to be different, but that's getting more invasive. We have recently made great strides having merging be a bulk byte-copy operation when possible (eg stored fields & term vectors do this now), so I agree it'd be fabulous to get the postings to do bulk byte copy. They are the slowest part of merging now. The frq postings could "almost" be made appendable, if we stored the last docID in a posting list in the term dictionary. This way we could append, but simply rewrite only the first document of each segment after the first segment to be the delta of its docID and the last docID in the segment before it. But again we'd be in trouble with the skip data. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]