Thanks Adrien. Is this behavior of FST something that has changed in Lucene
8.x (from 7.x)?
Also, is the terms index not loaded into memory anymore in 8.x?

To your point on MMapDirectoryFactory, it is much faster as you
anticipated, but the indexes commonly being >1 TB makes the Windows machine
freeze to a point I sometimes can't even connect to the VM.
SimpleFSDirectory works well for us from that standpoint.

To add, both NIOFS and SimpleFS have similar indexing benchmarks on
Windows. I understand it is because of the Java bug which synchronizes
internally in the native call for NIOFs.

-Rahul

On Tue, Jun 6, 2023 at 9:32 AM Adrien Grand <jpou...@gmail.com> wrote:

> +Alan Woodward helped me better understand what is going on here.
> BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory)
> doesn't play well with the fact that the FST reads bytes backwards:
> every call to readByte() triggers a refill of 1kB because it wants to
> read the byte that is just before what the buffer contains.
>
> On Tue, Jun 6, 2023 at 2:07 PM Adrien Grand <jpou...@gmail.com> wrote:
> >
> > My best guess based on your description of the issue is that
> > SimpleFSDirectory doesn't like the fact that the terms index now reads
> > data directly from the directory instead of loading the terms index in
> > heap. Would you be able to run the same benchmark with MMapDirectory
> > to check if it addresses the regression?
> >
> >
> > On Tue, Jun 6, 2023 at 5:47 AM Rahul Goswami <rahul196...@gmail.com>
> wrote:
> > >
> > > Hello,
> > > We started experiencing slowness with atomic updates in Solr after
> > > upgrading from 7.7.2 to 8.11.1. Running several tests revealed the
> > > slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() call
> > > which eventually calls Lucene's SegmentTermsEnum.seekExact()..
> > >
> > > In the benchmarks I ran, 8.11.1 is about 10x slower than 7.7.2. After
> > > discussion on the Solr mailing list I created the below JIRA:
> > >
> > > https://issues.apache.org/jira/browse/SOLR-16838
> > >
> > > The thread dumps collected show a lot of threads stuck in the
> > > FST.findTargetArc()
> > > method. Testing environment details:
> > >
> > > Environment details:
> > > - Java 11 on Windows server
> > > - Xms1536m Xmx3072m
> > > - Indexing client code running 15 parallel threads indexing in batches
> of
> > > 1000 on a standalone core.
> > > - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well
> on
> > > Windows for our index sizes which commonly run north of 1 TB)
> > >
> > >
> https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing
> > >
> > > Is there a known issue with slowness with TermsEnum.seekExact() in
> Lucene
> > > 8.x ?
> > >
> > > Thanks,
> > > Rahul
> >
> >
> >
> > --
> > Adrien
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to