Sorry for the delay, I opened https://issues.apache.org/jira/browse/LUCENE-6469. It can go to trunk and 5.x (the value of x depending on when it's ready :)).
On Thu, Apr 30, 2015 at 9:02 AM, Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> wrote: >> >> Would you like to submit a patch that changes SortingMergePolicy to >> use the approach that you are proposing using bitsets instead of >> sorting int[] arrays? > > > Sure can do that. Can you open a ticket for this, as I don't know what > versions this can go in? > > -- > Ravi > > > > On Tue, Apr 28, 2015 at 6:03 PM, Adrien Grand <jpou...@gmail.com> wrote: > >> On Tue, Apr 21, 2015 at 10:00 AM, Ravikumar Govindarajan >> <ravikumar.govindara...@gmail.com> wrote: >> > Thanks for the comments… >> > >> > My only >> >> concern about using the FixedBitSet is that it would make sorting each >> >> postings list run in O(maxDoc) but maybe we can make it better by >> >> using SparseFixedBitSet >> > >> > >> > Yes I was also thinking about this. But we are on 4.x and did not take >> the >> > plunge. But as you said, it should be a good idea to test on SFBS >> >> Would you like to submit a patch that changes SortingMergePolicy to >> use the approach that you are proposing using bitsets instead of >> sorting int[] arrays? >> >> > I'm curious if you already performed any kind of benchmarking of this >> >> approach? >> > >> > >> > Yes we did a stress test of sorts aimed at SortingMergePolicy. We made >> most >> > of our data as RAM resident and then CPU hot-spots came up... >> > >> > There were few take-aways from the test. I am listing down few of them.. >> > It's kind of lengthy. Please read through... >> > >> > a) Postings-List issue, as discussed above… >> > >> > b) CompressingStoredFieldsReader did not store the last decoded 32KB >> chunk. >> > Our segments are already sorted before participating in a merge. On >> mostly >> > linear merge, we ended up decoding the same chunk again and again. Simply >> > storing the last chunk resulted in good speed-ups for us... >> > >> > c) Once above steps were corrected, the CPU hotspot shifted to >> > BlockDocsEnum. Here most of our postings-list < 128 docs. So >> > Lucene41Postings started using vInts… I did try ForUtil encoding even >> for >> > < 128 docs. It definitely went easy on CPU. But failed to measure >> resulting >> > file-size increase. >> > >> > I realised not just SMP but any other merge must face the same issue and >> > left it at that.. >> >> True. Like Robert said, there has been work done on b) already and I >> think we can move forward on a) too. Thanks for sharing your findings! >> >> -- >> Adrien >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org