Hi Adrien et al,
I've been doing some investigation today and it looks like whatever the
change is, it happens between 9.4.2 and 9.5.0.
I made a smaller test set up for our code that mocks our documents and just
runs through the indexing portion of our code sending in batches of 4k
documents at a time. This way I can run it locally.
9.4.2: ~1200-2000 documents per second
9.5.0: ~150-400 documents per second

I'll continue investigating, but nothing in the release notes jumped out to
me.
https://lucene.apache.org/core/9_10_0/changes/Changes.html#v9.5.0

Sorry I don't have anything more rigorous yet.  I'm doing this
investigation in parallel with some other things.
But any insight or suggestions on areas to look would be appreciated.
Thank you,
Marc

On Wed, Apr 17, 2024 at 4:18 PM Adrien Grand <jpou...@gmail.com> wrote:

> Hi Marc,
>
> Nothing jumps to mind as a potential cause for this 2x regression. It would
> be interesting to look at a profile.
>
> On Wed, Apr 17, 2024 at 9:32 PM Marc Davenport
> <madavenp...@cargurus.com.invalid> wrote:
>
> > Hello,
> > I'm finally migrating Lucene from 8.11.2 to 9.10.0 as our overall build
> can
> > now support Java 11. The quick first step of renaming packages and
> > importing the new libraries has gone well.  I'm even seeing a nice
> > performance bump in our average query time. I am however seeing a
> dramatic
> > increase in our indexing time.  We are indexing ~3.1 million documents
> each
> > with about 100 attributes used for facet filter, and sorting; no lexical
> > text search.  Our indexing time has jumped from ~1k seconds to ~2k
> > seconds.  I have yet to profile the individual aspects of how we convert
> > our data to records vs time for the index writer to accept the documents.
> > I'm curious if other users discovered this for their migrations at some
> > point.  Or if there are some changes to defaults that I did not see in
> the
> > migration guide that would account for this?  Looking at the logs I can
> see
> > that as we are indexing the documents we commit every 10 minutes.
> > Thank you,
> > Marc
> >
>
>
> --
> Adrien
>

Reply via email to