Hi! I'll try to complete what Simon and Robert said:
On Thu, Nov 8, 2012 at 8:56 AM, eksdev <[email protected]> wrote: > Just a theoretical question, would it make sense to add some sort of > StoredDocument[] bulkGet(int[] docId) to fetch multiple stored documents in > one go? > > The reasoning behind is that now with compressed blocks random-access gets > more expensive, and in some cases a user needs to fetch more documents in > one go. If it happens that more documents come from one block it is a win. > I would also assume, even without compression , bulk access on sorted > docIds cold be a win (sequential access)? > > Does that make sense, is it doable? Or even worse, does it already exist :) > Even with small documents (100 bytes, 160 docs per chunk) and a small index (100K docs), there would still be 625 chunks, so the probability of two documents of the same results page being in the same chunk is very low. So I think we should not optimize for this case. However, CompressingStoredFieldsFormat implements efficient sequential iteration internally in order to improve merging performance: when merging a segment, every chunk gets decompressed only once. By the way, I am impressed how well compression does, even on really short > stored documents, approx. 150b we observe 35% reduction. Fetching 1000 > short documents on fully cached index is observably slower (2-3 times), > but as soon as you memory gets low, compression wins quickly. > Awesome! Thank you for trying it! -- Adrien
