Re: Indexing time increase moving from Lucene 8 to 9

Marc Davenport Mon, 22 Apr 2024 13:27:50 -0700

Hello,
I've done bisect between 9.4.2 and 9.5 and found the PR affecting my
particular set up : https://github.com/apache/lucene/pull/12093
This is the switch from UTF8TaxonomyWriterCache to an
LruTaxonomyWriterCache.   I don't see a way to control the size of this
cache to never expel items and match the previous behavior.
Marc



On Fri, Apr 19, 2024 at 4:39 PM Marc Davenport <madavenp...@cargurus.com>
wrote:

> Hello,
> Thanks for the leads. I haven't yet gone as far as doing a git bisect, but
> I have found that the big jump in time is in the call to
> facetsConfig.build(taxonomyWriter, doc);  I made a quick and dirty
> instrumented version of the FacetsConfig class and found that calls to
> TaxonomyWriter.add(FacetLabel) are significantly slower for me.
>
>
> https://github.com/apache/lucene/blob/releases/lucene/9.5.0/lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java#L383
>
> I don't know what is special about my documents that I would be seeing
> this change.  I'm going to start dropping groups of our facets from the
> documents and seeing if there is some threshold that I'm hitting.  I'll
> probably start with our hierarchies which are not particularly large, but
> are the most suspect.
>
> Thanks for any input,
> Marc
>
> 9.4.2
> Time(ms) per Document
> facetConfig.build   : 0.9882365
> Taxo Add            : 0.8334876
>
> 9.5
> facetConfig.build   : 11.037549
> Taxo Add            : 10.915726
>
> On Fri, Apr 19, 2024 at 2:56 AM Dawid Weiss <dawid.we...@gmail.com> wrote:
>
>> Hi Marc,
>>
>> You could try git bisect lucene repository to pinpoint the commit that
>> caused what you're observing. It'll take some time to build but it's a
>> logarithmic bisection and you'd know for sure where the problem is.
>>
>> D.
>>
>> On Thu, Apr 18, 2024 at 11:16 PM Marc Davenport
>> <madavenp...@cargurus.com.invalid> wrote:
>>
>> > Hi Adrien et al,
>> > I've been doing some investigation today and it looks like whatever the
>> > change is, it happens between 9.4.2 and 9.5.0.
>> > I made a smaller test set up for our code that mocks our documents and
>> just
>> > runs through the indexing portion of our code sending in batches of 4k
>> > documents at a time. This way I can run it locally.
>> > 9.4.2: ~1200-2000 documents per second
>> > 9.5.0: ~150-400 documents per second
>> >
>> > I'll continue investigating, but nothing in the release notes jumped
>> out to
>> > me.
>> > https://lucene.apache.org/core/9_10_0/changes/Changes.html#v9.5.0
>> >
>> > Sorry I don't have anything more rigorous yet.  I'm doing this
>> > investigation in parallel with some other things.
>> > But any insight or suggestions on areas to look would be appreciated.
>> > Thank you,
>> > Marc
>> >
>> > On Wed, Apr 17, 2024 at 4:18 PM Adrien Grand <jpou...@gmail.com> wrote:
>> >
>> > > Hi Marc,
>> > >
>> > > Nothing jumps to mind as a potential cause for this 2x regression. It
>> > would
>> > > be interesting to look at a profile.
>> > >
>> > > On Wed, Apr 17, 2024 at 9:32 PM Marc Davenport
>> > > <madavenp...@cargurus.com.invalid> wrote:
>> > >
>> > > > Hello,
>> > > > I'm finally migrating Lucene from 8.11.2 to 9.10.0 as our overall
>> build
>> > > can
>> > > > now support Java 11. The quick first step of renaming packages and
>> > > > importing the new libraries has gone well.  I'm even seeing a nice
>> > > > performance bump in our average query time. I am however seeing a
>> > > dramatic
>> > > > increase in our indexing time.  We are indexing ~3.1 million
>> documents
>> > > each
>> > > > with about 100 attributes used for facet filter, and sorting; no
>> > lexical
>> > > > text search.  Our indexing time has jumped from ~1k seconds to ~2k
>> > > > seconds.  I have yet to profile the individual aspects of how we
>> > convert
>> > > > our data to records vs time for the index writer to accept the
>> > documents.
>> > > > I'm curious if other users discovered this for their migrations at
>> some
>> > > > point.  Or if there are some changes to defaults that I did not see
>> in
>> > > the
>> > > > migration guide that would account for this?  Looking at the logs I
>> can
>> > > see
>> > > > that as we are indexing the documents we commit every 10 minutes.
>> > > > Thank you,
>> > > > Marc
>> > > >
>> > >
>> > >
>> > > --
>> > > Adrien
>> > >
>> >
>>
>

Re: Indexing time increase moving from Lucene 8 to 9

Reply via email to