mikemccand commented on issue #15079:
URL: https://github.com/apache/lucene/issues/15079#issuecomment-3223688126

   > The 4 days of silence of the benchy machine made the analysis very 
complicated.
   
   I agree, sorry @uschindler -- benchmarking is hard!  We have some good data 
points now, four new runs since the wall-of-silence.
   
   > An alternative approach would be to autogenerate the code for the three 
instances with a simple python script....
   
   Oooh this is my favorite!
   
   > Here is another rwwrite of the GroupVInt improvement without any hacks, 
just plain Lucene API usage.... 
[#15116](https://github.com/apache/lucene/pull/15116)
   
   Thanks for all the digging @uschindler!
   
   Unfortunately, another change I did was to enable `InfoStream` when building 
that vectors index, so that I could generate the segment trace (using 
`infostream_to_segments.py` and `segments_to_html.py` tools in 
[luceneutil](https://github.com/mikemccand/luceneutil)).  This was first done 
in the 08/21 run (that ended wall-of-silence).  InfoStream in the past hasn't 
changed indexing throughput measurably, so this shouldn't be important, but I 
wanted to call out for full transparency.
   
   What I learned from that InfoStream, and its [horrifying segment 
trace](https://githubsearch.mikemccandless.com/nightly-vectors-index.html), is 
that this indexing run is not healthy -- a huge backlog of merges accumulates, 
but, they never stall indexing.  Lucene's defaulting to an allowed big backlog 
here:
   
   ```
   MS 0 [2025-08-23T01:23:51.617446691Z; main]: initDynamicDefaults 
maxThreadCount=64 maxMergeCount=69
   ```
   
   Net/net I'm not so sure I trust this stressful-ish benchy now ... I need to 
study the segment trace some more (also, this reveals some bugs in segment 
trace generation).
   
   Anyway, I will disable InfoStream for tonite's run, just to remove that 
possible variable -- since the indexing is so degenerate, maybe InfoStream is 
adding too much cost!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to