Mike, I'm tempted to backport https://github.com/apache/lucene/pull/1068 to
branch_9_4, which is a bugfix that looks pretty safe to me. What do you
think?

On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova
<mayya.sharip...@elastic.co.invalid> wrote:

> Thanks for running more tests, Michael.
> It is encouraging that you saw a similar performance between 9.3 and 9.4.
> I will also run more tests with different parameters.
>
> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <msoko...@gmail.com>
> wrote:
>
>> As a follow-up, I ran a test using the same parameters as above, only
>> changing M=200 to M=16. This did result in a single segment in both
>> cases (9.3, 9.4) and the performance was pretty similar; within noise
>> I think. The main difference I saw was that the 9.3 index was written
>> using CFS:
>>
>> 9.4:
>> recall  latency nDoc    fanout  maxConn beamWidth       visited index ms
>> 0.755    1.36   1000000 100     16      100     200     891402  1.00
>>  post-filter
>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>> _0_Lucene94HnswVectorsFormat_0.vec
>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>> _0_Lucene94HnswVectorsFormat_0.vem
>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>> _0_Lucene94HnswVectorsFormat_0.vex
>>
>> 9.3:
>> recall  latency nDoc    fanout  maxConn beamWidth       visited index ms
>> 0.775    1.34   1000000 100     16      100     4033    977043
>> rw-r--r-- 1 sokolovm amazon  297 Sep 13 13:26 _0.cfe
>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>> -rw-r--r-- 1 sokolovm amazon  340 Sep 13 13:26 _0.si
>>
>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <msoko...@gmail.com>
>> wrote:
>> >
>> > I ran another test. I thought I had increased the RAM buffer size to
>> > 8G and heap to 16G. However I still see two segments in the index that
>> > was created. And looking at the infostream I see:
>> >
>> > dir=MMapDirectory@
>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>> > lockFactory=org\
>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>> > index=
>> > version=9.4.0
>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>> > ramBufferSizeMB=8000.0
>> > maxBufferedDocs=-1
>> > ...
>> > perThreadHardLimitMB=1945
>> > ...
>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
>> > segment _6 numDocs=555373
>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write docValues
>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write vectors
>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored
>> fields
>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
>> > and finish vectors
>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write fieldInfos
>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0
>> deleted docs
>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
>> > soft-deleted docs
>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
>> > vectors; no norms; no docValues; no prox; freqs
>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt, _6_\
>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>> > _6_Lucene94HnswVectorsFormat_0.vex]
>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>> > docs/MB=521.134
>> >
>> > so I think it's this perThreadHardLimit that is triggering the
>> > flushes? TBH this isn't something I had seen before; but the docs say:
>> >
>> >   /**
>> >    * Expert: Sets the maximum memory consumption per thread triggering
>> > a forced flush if exceeded. A
>> >    * {@link DocumentsWriterPerThread} is forcefully flushed once it
>> > exceeds this limit even if the
>> >    * {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
>> > safety limit to prevent a {@link
>> >    * DocumentsWriterPerThread} from address space exhaustion due to
>> > its internal 32 bit signed
>> >    * integer based memory addressing. The given value must be less
>> > that 2GB (2048MB)
>> >    *
>> >    * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>> >    */
>> >
>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msoko...@gmail.com>
>> wrote:
>> > >
>> > > Hi Mayya, thanks for persisting - I think we need to wrestle this to
>> > > the ground for sure. In the test I ran, RAM buffer was the default
>> > > checked in, which is weirdly: 1994MB. I did not specifically set heap
>> > > size. I used maxConn/M=200. I'll  try with larger buffer to see if I
>> > > can get 9.4 to produce a single segment for the same test settings. I
>> > > see you used a much smaller M (16), which should have produced quite
>> > > small graphs, and I agree, should have been a single segment. Were you
>> > > able to verify the number of segments?
>> > >
>> > > Agree that decrease in recall is not expected when more segments are
>> produced.
>> > >
>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>> > > <mayya.sharip...@elastic.co.invalid> wrote:
>> > > >
>> > > > Hello Michael,
>> > > > Thanks for checking.
>> > > > Sorry for bringing this up again.
>> > > > First of all, I am ok with proceeding with the Lucene 9.4 release
>> and leaving the performance investigations for later.
>> > > >
>> > > > I am interested in what's the maxConn/M value you used for your
>> tests? What was the heap memory and the size of the RAM buffer for indexing?
>> > > > Usually, when we have multiple segments, recall should increase,
>> not decrease. But I agree that with multiple segments we can see a big drop
>> in QPS.
>> > > >
>> > > > Here is my investigation with detailed output of the performance
>> difference between 9.3 and 9.4 releases. In my tests I used a large
>> indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment
>> for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>> > > >
>> > > > Thank you.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseyg...@gmail.com>
>> wrote:
>> > > >>
>> > > >> Done.  Thanks!
>> > > >>
>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msoko...@gmail.com>
>> wrote:
>> > > >> >
>> > > >> > Hi Alan - I checked out the interval queries patch; seems pretty
>> safe,
>> > > >> > please go ahead and port to 9.4.  Thanks!
>> > > >> >
>> > > >> > Mike
>> > > >> >
>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>> romseyg...@gmail.com> wrote:
>> > > >> >>
>> > > >> >> Hi Mike,
>> > > >> >>
>> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a
>> small bug fix PR for a problem with interval queries.  Am I OK to port this
>> to the 9.4 branch?
>> > > >> >>
>> > > >> >> Thanks, Alan
>> > > >> >>
>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msoko...@gmail.com>
>> wrote:
>> > > >> >>
>> > > >> >> NOTICE:
>> > > >> >>
>> > > >> >> Branch branch_9_4 has been cut and versions updated to 9.5 on
>> stable branch.
>> > > >> >>
>> > > >> >> Please observe the normal rules:
>> > > >> >>
>> > > >> >> * No new features may be committed to the branch.
>> > > >> >> * Documentation patches, build patches and serious bug fixes
>> may be
>> > > >> >> committed to the branch. However, you should submit all patches
>> you
>> > > >> >> want to commit to Jira first to give others the chance to review
>> > > >> >> and possibly vote against the patch. Keep in mind that it is our
>> > > >> >> main intention to keep the branch as stable as possible.
>> > > >> >> * All patches that are intended for the branch should first be
>> committed
>> > > >> >> to the unstable branch, merged into the stable branch, and then
>> into
>> > > >> >> the current release branch.
>> > > >> >> * Normal unstable and stable branch development may continue as
>> usual.
>> > > >> >> However, if you plan to commit a big change to the unstable
>> branch
>> > > >> >> while the branch feature freeze is in effect, think twice:
>> can't the
>> > > >> >> addition wait a couple more days? Merges of bug fixes into the
>> branch
>> > > >> >> may become more difficult.
>> > > >> >> * Only Jira issues with Fix version 9.4 and priority "Blocker"
>> will delay
>> > > >> >> a release candidate build.
>> > > >> >>
>> > > >> >>
>> ---------------------------------------------------------------------
>> > > >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > > >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> > > >> >>
>> > > >> >>
>> > > >> >
>> > > >> >
>> ---------------------------------------------------------------------
>> > > >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > > >> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> > > >> >
>> > > >>
>> > > >>
>> > > >>
>> ---------------------------------------------------------------------
>> > > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > > >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> > > >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
Adrien

Reply via email to