Hi,

How often do you commit? If you index the data initially (that's the case where 
indexing needs to be fast), one would call commit at the end of the whole job, 
so the actual time it takes is not so important.

If you have a system where the index is updated all the time, then of course 
committing is also something you have to take into account. Systems like Solr 
or Elasticsearch use a transaction log in parallel to indexing, so they commit 
very seldom. If the system crashes, the changes are replayed from tranlog since 
last commit.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Rob Audenaerde [mailto:rob.audenae...@gmail.com]
> Sent: Monday, January 29, 2018 11:29 AM
> To: java-user@lucene.apache.org
> Subject: Re: indexing performance 6.6 vs 7.1
> 
> Hi all,
> 
> Some follow up (sorry for the delay).
> 
> We built a benchmark in our application, and profiled it (on a smallish
> data set). What we currently see in the profiler is that in Lucene 7.1 the
> calls to `commit()` take much longer.
> 
> The self-time committing in 6.6: 3,215 ms
> The self-time committing in 7.1: 10,187 ms.
> 
> We will try to run a larger data set and also later with the IW info
> stream.
> 
> -Rob
> 
> On Thu, Jan 18, 2018 at 7:03 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
> > Robert:
> >
> > Ah, right. I keep confusing my gmail lists
> > "lucene dev"
> > and
> > "lucene list"....
> >
> > Siiigggghhhhh.
> >
> >
> >
> > On Thu, Jan 18, 2018 at 9:18 AM, Adrien Grand <jpou...@gmail.com>
> wrote:
> > > If you have sparse data, I would have expected index time to *decrease*,
> > > not increase.
> > >
> > > Can you enable the IW info stream and share flush + merge times to see
> > > where indexing time goes?
> > >
> > > If you can run with a profiler, this might also give useful information.
> > >
> > > Le jeu. 18 janv. 2018 à 11:23, Rob Audenaerde
> <rob.audenae...@gmail.com>
> > a
> > > écrit :
> > >
> > >> Hi all,
> > >>
> > >> We recently upgraded from Lucene 6.6 to 7.1.  We see a significant drop
> > in
> > >> indexing performace.
> > >>
> > >> We have a-typical use of Lucene, as we (also) index some database
> tables
> > >> and add all the values as AssociatedFacetFields as well. This allows us
> > to
> > >> create pivot tables on search results really fast.
> > >>
> > >> These tables have some overlapping columns, but also disjoint ones.
> > >>
> > >> We anticipated a decrease in index size because of the sparse
> > docvalues. We
> > >> see this happening, with decreases to ~50%-80% of the original index
> > size.
> > >> But we did not expect an drop in indexing performance (client systems
> > >> indexing time increased with +50% to +250%).
> > >>
> > >> (Our indexing-speed used to be mainly bound by the speed the
> Taxonomy
> > could
> > >> deliver new ordinals for new values, currently we are investigating if
> > this
> > >> is still the case, will report later when a profiler run has been done)
> > >>
> > >> Does anyone know if this increase in indexing time is to be expected as
> > >> result of the sparse docvalues change?
> > >>
> > >> Kind regards,
> > >>
> > >> Rob Audenaerde
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to