Hi all, We ran the benchmarks (6.6 vs 7.1) with IW info stream and (as attachment cannot be too large) I uploaded them to google drive. They can be found here:
https://drive.google.com/open?id=1-nAHgpPO3qZ78lnvvlQ0_lF4uHJ-cWLh Thanks in advance, -Rob On Mon, Jan 29, 2018 at 1:08 PM, Rob Audenaerde <[email protected]> wrote: > Hi Uwe, > > Thanks for the reply. We commit often. Actually, in the benchmark, we > commit every 60 documents (but we will run a larger set with less commits). > The number of commits we call does not change between 6.6. and 7.1. In our > production systems we commit every 5000 documents. > > We dug deeper into the commit methods, and currently see the main > difference seems to be the calls to the java.util.zit.Checksum.update(). > The number of calls to that method in 6.6 is around 11M , and 7.1 21M, so > almost twice the calls. > > -Rob > > On Mon, Jan 29, 2018 at 12:18 PM, Uwe Schindler <[email protected]> wrote: > >> Hi, >> >> How often do you commit? If you index the data initially (that's the case >> where indexing needs to be fast), one would call commit at the end of the >> whole job, so the actual time it takes is not so important. >> >> If you have a system where the index is updated all the time, then of >> course committing is also something you have to take into account. Systems >> like Solr or Elasticsearch use a transaction log in parallel to indexing, >> so they commit very seldom. If the system crashes, the changes are replayed >> from tranlog since last commit. >> >> Uwe >> >> ----- >> Uwe Schindler >> Achterdiek 19, D-28357 Bremen >> http://www.thetaphi.de >> eMail: [email protected] >> >> > -----Original Message----- >> > From: Rob Audenaerde [mailto:[email protected]] >> > Sent: Monday, January 29, 2018 11:29 AM >> > To: [email protected] >> > Subject: Re: indexing performance 6.6 vs 7.1 >> > >> > Hi all, >> > >> > Some follow up (sorry for the delay). >> > >> > We built a benchmark in our application, and profiled it (on a smallish >> > data set). What we currently see in the profiler is that in Lucene 7.1 >> the >> > calls to `commit()` take much longer. >> > >> > The self-time committing in 6.6: 3,215 ms >> > The self-time committing in 7.1: 10,187 ms. >> > >> > We will try to run a larger data set and also later with the IW info >> > stream. >> > >> > -Rob >> > >> > On Thu, Jan 18, 2018 at 7:03 PM, Erick Erickson < >> [email protected]> >> > wrote: >> > >> > > Robert: >> > > >> > > Ah, right. I keep confusing my gmail lists >> > > "lucene dev" >> > > and >> > > "lucene list".... >> > > >> > > Siiigggghhhhh. >> > > >> > > >> > > >> > > On Thu, Jan 18, 2018 at 9:18 AM, Adrien Grand <[email protected]> >> > wrote: >> > > > If you have sparse data, I would have expected index time to >> *decrease*, >> > > > not increase. >> > > > >> > > > Can you enable the IW info stream and share flush + merge times to >> see >> > > > where indexing time goes? >> > > > >> > > > If you can run with a profiler, this might also give useful >> information. >> > > > >> > > > Le jeu. 18 janv. 2018 à 11:23, Rob Audenaerde >> > <[email protected]> >> > > a >> > > > écrit : >> > > > >> > > >> Hi all, >> > > >> >> > > >> We recently upgraded from Lucene 6.6 to 7.1. We see a significant >> drop >> > > in >> > > >> indexing performace. >> > > >> >> > > >> We have a-typical use of Lucene, as we (also) index some database >> > tables >> > > >> and add all the values as AssociatedFacetFields as well. This >> allows us >> > > to >> > > >> create pivot tables on search results really fast. >> > > >> >> > > >> These tables have some overlapping columns, but also disjoint ones. >> > > >> >> > > >> We anticipated a decrease in index size because of the sparse >> > > docvalues. We >> > > >> see this happening, with decreases to ~50%-80% of the original >> index >> > > size. >> > > >> But we did not expect an drop in indexing performance (client >> systems >> > > >> indexing time increased with +50% to +250%). >> > > >> >> > > >> (Our indexing-speed used to be mainly bound by the speed the >> > Taxonomy >> > > could >> > > >> deliver new ordinals for new values, currently we are >> investigating if >> > > this >> > > >> is still the case, will report later when a profiler run has been >> done) >> > > >> >> > > >> Does anyone know if this increase in indexing time is to be >> expected as >> > > >> result of the sparse docvalues change? >> > > >> >> > > >> Kind regards, >> > > >> >> > > >> Rob Audenaerde >> > > >> >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: [email protected] >> > > For additional commands, e-mail: [email protected] >> > > >> > > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >
