Hello, Thanks for the leads. I haven't yet gone as far as doing a git bisect, but I have found that the big jump in time is in the call to facetsConfig.build(taxonomyWriter, doc); I made a quick and dirty instrumented version of the FacetsConfig class and found that calls to TaxonomyWriter.add(FacetLabel) are significantly slower for me.
https://github.com/apache/lucene/blob/releases/lucene/9.5.0/lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java#L383 I don't know what is special about my documents that I would be seeing this change. I'm going to start dropping groups of our facets from the documents and seeing if there is some threshold that I'm hitting. I'll probably start with our hierarchies which are not particularly large, but are the most suspect. Thanks for any input, Marc 9.4.2 Time(ms) per Document facetConfig.build : 0.9882365 Taxo Add : 0.8334876 9.5 facetConfig.build : 11.037549 Taxo Add : 10.915726 On Fri, Apr 19, 2024 at 2:56 AM Dawid Weiss <dawid.we...@gmail.com> wrote: > Hi Marc, > > You could try git bisect lucene repository to pinpoint the commit that > caused what you're observing. It'll take some time to build but it's a > logarithmic bisection and you'd know for sure where the problem is. > > D. > > On Thu, Apr 18, 2024 at 11:16 PM Marc Davenport > <madavenp...@cargurus.com.invalid> wrote: > > > Hi Adrien et al, > > I've been doing some investigation today and it looks like whatever the > > change is, it happens between 9.4.2 and 9.5.0. > > I made a smaller test set up for our code that mocks our documents and > just > > runs through the indexing portion of our code sending in batches of 4k > > documents at a time. This way I can run it locally. > > 9.4.2: ~1200-2000 documents per second > > 9.5.0: ~150-400 documents per second > > > > I'll continue investigating, but nothing in the release notes jumped out > to > > me. > > https://lucene.apache.org/core/9_10_0/changes/Changes.html#v9.5.0 > > > > Sorry I don't have anything more rigorous yet. I'm doing this > > investigation in parallel with some other things. > > But any insight or suggestions on areas to look would be appreciated. > > Thank you, > > Marc > > > > On Wed, Apr 17, 2024 at 4:18 PM Adrien Grand <jpou...@gmail.com> wrote: > > > > > Hi Marc, > > > > > > Nothing jumps to mind as a potential cause for this 2x regression. It > > would > > > be interesting to look at a profile. > > > > > > On Wed, Apr 17, 2024 at 9:32 PM Marc Davenport > > > <madavenp...@cargurus.com.invalid> wrote: > > > > > > > Hello, > > > > I'm finally migrating Lucene from 8.11.2 to 9.10.0 as our overall > build > > > can > > > > now support Java 11. The quick first step of renaming packages and > > > > importing the new libraries has gone well. I'm even seeing a nice > > > > performance bump in our average query time. I am however seeing a > > > dramatic > > > > increase in our indexing time. We are indexing ~3.1 million > documents > > > each > > > > with about 100 attributes used for facet filter, and sorting; no > > lexical > > > > text search. Our indexing time has jumped from ~1k seconds to ~2k > > > > seconds. I have yet to profile the individual aspects of how we > > convert > > > > our data to records vs time for the index writer to accept the > > documents. > > > > I'm curious if other users discovered this for their migrations at > some > > > > point. Or if there are some changes to defaults that I did not see > in > > > the > > > > migration guide that would account for this? Looking at the logs I > can > > > see > > > > that as we are indexing the documents we commit every 10 minutes. > > > > Thank you, > > > > Marc > > > > > > > > > > > > > -- > > > Adrien > > > > > >