Hello, I've done bisect between 9.4.2 and 9.5 and found the PR affecting my particular set up : https://github.com/apache/lucene/pull/12093 This is the switch from UTF8TaxonomyWriterCache to an LruTaxonomyWriterCache. I don't see a way to control the size of this cache to never expel items and match the previous behavior. Marc
On Fri, Apr 19, 2024 at 4:39 PM Marc Davenport <madavenp...@cargurus.com> wrote: > Hello, > Thanks for the leads. I haven't yet gone as far as doing a git bisect, but > I have found that the big jump in time is in the call to > facetsConfig.build(taxonomyWriter, doc); I made a quick and dirty > instrumented version of the FacetsConfig class and found that calls to > TaxonomyWriter.add(FacetLabel) are significantly slower for me. > > > https://github.com/apache/lucene/blob/releases/lucene/9.5.0/lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java#L383 > > I don't know what is special about my documents that I would be seeing > this change. I'm going to start dropping groups of our facets from the > documents and seeing if there is some threshold that I'm hitting. I'll > probably start with our hierarchies which are not particularly large, but > are the most suspect. > > Thanks for any input, > Marc > > 9.4.2 > Time(ms) per Document > facetConfig.build : 0.9882365 > Taxo Add : 0.8334876 > > 9.5 > facetConfig.build : 11.037549 > Taxo Add : 10.915726 > > On Fri, Apr 19, 2024 at 2:56 AM Dawid Weiss <dawid.we...@gmail.com> wrote: > >> Hi Marc, >> >> You could try git bisect lucene repository to pinpoint the commit that >> caused what you're observing. It'll take some time to build but it's a >> logarithmic bisection and you'd know for sure where the problem is. >> >> D. >> >> On Thu, Apr 18, 2024 at 11:16 PM Marc Davenport >> <madavenp...@cargurus.com.invalid> wrote: >> >> > Hi Adrien et al, >> > I've been doing some investigation today and it looks like whatever the >> > change is, it happens between 9.4.2 and 9.5.0. >> > I made a smaller test set up for our code that mocks our documents and >> just >> > runs through the indexing portion of our code sending in batches of 4k >> > documents at a time. This way I can run it locally. >> > 9.4.2: ~1200-2000 documents per second >> > 9.5.0: ~150-400 documents per second >> > >> > I'll continue investigating, but nothing in the release notes jumped >> out to >> > me. >> > https://lucene.apache.org/core/9_10_0/changes/Changes.html#v9.5.0 >> > >> > Sorry I don't have anything more rigorous yet. I'm doing this >> > investigation in parallel with some other things. >> > But any insight or suggestions on areas to look would be appreciated. >> > Thank you, >> > Marc >> > >> > On Wed, Apr 17, 2024 at 4:18 PM Adrien Grand <jpou...@gmail.com> wrote: >> > >> > > Hi Marc, >> > > >> > > Nothing jumps to mind as a potential cause for this 2x regression. It >> > would >> > > be interesting to look at a profile. >> > > >> > > On Wed, Apr 17, 2024 at 9:32 PM Marc Davenport >> > > <madavenp...@cargurus.com.invalid> wrote: >> > > >> > > > Hello, >> > > > I'm finally migrating Lucene from 8.11.2 to 9.10.0 as our overall >> build >> > > can >> > > > now support Java 11. The quick first step of renaming packages and >> > > > importing the new libraries has gone well. I'm even seeing a nice >> > > > performance bump in our average query time. I am however seeing a >> > > dramatic >> > > > increase in our indexing time. We are indexing ~3.1 million >> documents >> > > each >> > > > with about 100 attributes used for facet filter, and sorting; no >> > lexical >> > > > text search. Our indexing time has jumped from ~1k seconds to ~2k >> > > > seconds. I have yet to profile the individual aspects of how we >> > convert >> > > > our data to records vs time for the index writer to accept the >> > documents. >> > > > I'm curious if other users discovered this for their migrations at >> some >> > > > point. Or if there are some changes to defaults that I did not see >> in >> > > the >> > > > migration guide that would account for this? Looking at the logs I >> can >> > > see >> > > > that as we are indexing the documents we commit every 10 minutes. >> > > > Thank you, >> > > > Marc >> > > > >> > > >> > > >> > > -- >> > > Adrien >> > > >> > >> >