On Tue, Apr 21, 2015 at 10:00 AM, Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> wrote: > Thanks for the comments… > > My only >> concern about using the FixedBitSet is that it would make sorting each >> postings list run in O(maxDoc) but maybe we can make it better by >> using SparseFixedBitSet > > > Yes I was also thinking about this. But we are on 4.x and did not take the > plunge. But as you said, it should be a good idea to test on SFBS
Would you like to submit a patch that changes SortingMergePolicy to use the approach that you are proposing using bitsets instead of sorting int[] arrays? > I'm curious if you already performed any kind of benchmarking of this >> approach? > > > Yes we did a stress test of sorts aimed at SortingMergePolicy. We made most > of our data as RAM resident and then CPU hot-spots came up... > > There were few take-aways from the test. I am listing down few of them.. > It's kind of lengthy. Please read through... > > a) Postings-List issue, as discussed above… > > b) CompressingStoredFieldsReader did not store the last decoded 32KB chunk. > Our segments are already sorted before participating in a merge. On mostly > linear merge, we ended up decoding the same chunk again and again. Simply > storing the last chunk resulted in good speed-ups for us... > > c) Once above steps were corrected, the CPU hotspot shifted to > BlockDocsEnum. Here most of our postings-list < 128 docs. So > Lucene41Postings started using vInts… I did try ForUtil encoding even for > < 128 docs. It definitely went easy on CPU. But failed to measure resulting > file-size increase. > > I realised not just SMP but any other merge must face the same issue and > left it at that.. True. Like Robert said, there has been work done on b) already and I think we can move forward on a) too. Thanks for sharing your findings! -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org