> > Would you like to submit a patch that changes SortingMergePolicy to > use the approach that you are proposing using bitsets instead of > sorting int[] arrays?
Sure can do that. Can you open a ticket for this, as I don't know what versions this can go in? -- Ravi On Tue, Apr 28, 2015 at 6:03 PM, Adrien Grand <jpou...@gmail.com> wrote: > On Tue, Apr 21, 2015 at 10:00 AM, Ravikumar Govindarajan > <ravikumar.govindara...@gmail.com> wrote: > > Thanks for the comments… > > > > My only > >> concern about using the FixedBitSet is that it would make sorting each > >> postings list run in O(maxDoc) but maybe we can make it better by > >> using SparseFixedBitSet > > > > > > Yes I was also thinking about this. But we are on 4.x and did not take > the > > plunge. But as you said, it should be a good idea to test on SFBS > > Would you like to submit a patch that changes SortingMergePolicy to > use the approach that you are proposing using bitsets instead of > sorting int[] arrays? > > > I'm curious if you already performed any kind of benchmarking of this > >> approach? > > > > > > Yes we did a stress test of sorts aimed at SortingMergePolicy. We made > most > > of our data as RAM resident and then CPU hot-spots came up... > > > > There were few take-aways from the test. I am listing down few of them.. > > It's kind of lengthy. Please read through... > > > > a) Postings-List issue, as discussed above… > > > > b) CompressingStoredFieldsReader did not store the last decoded 32KB > chunk. > > Our segments are already sorted before participating in a merge. On > mostly > > linear merge, we ended up decoding the same chunk again and again. Simply > > storing the last chunk resulted in good speed-ups for us... > > > > c) Once above steps were corrected, the CPU hotspot shifted to > > BlockDocsEnum. Here most of our postings-list < 128 docs. So > > Lucene41Postings started using vInts… I did try ForUtil encoding even > for > > < 128 docs. It definitely went easy on CPU. But failed to measure > resulting > > file-size increase. > > > > I realised not just SMP but any other merge must face the same issue and > > left it at that.. > > True. Like Robert said, there has been work done on b) already and I > think we can move forward on a) too. Thanks for sharing your findings! > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >