mikemccand commented on issue #15079: URL: https://github.com/apache/lucene/issues/15079#issuecomment-3223688126
> The 4 days of silence of the benchy machine made the analysis very complicated. I agree, sorry @uschindler -- benchmarking is hard! We have some good data points now, four new runs since the wall-of-silence. > An alternative approach would be to autogenerate the code for the three instances with a simple python script.... Oooh this is my favorite! > Here is another rwwrite of the GroupVInt improvement without any hacks, just plain Lucene API usage.... [#15116](https://github.com/apache/lucene/pull/15116) Thanks for all the digging @uschindler! Unfortunately, another change I did was to enable `InfoStream` when building that vectors index, so that I could generate the segment trace (using `infostream_to_segments.py` and `segments_to_html.py` tools in [luceneutil](https://github.com/mikemccand/luceneutil)). This was first done in the 08/21 run (that ended wall-of-silence). InfoStream in the past hasn't changed indexing throughput measurably, so this shouldn't be important, but I wanted to call out for full transparency. What I learned from that InfoStream, and its [horrifying segment trace](https://githubsearch.mikemccandless.com/nightly-vectors-index.html), is that this indexing run is not healthy -- a huge backlog of merges accumulates, but, they never stall indexing. Lucene's defaulting to an allowed big backlog here: ``` MS 0 [2025-08-23T01:23:51.617446691Z; main]: initDynamicDefaults maxThreadCount=64 maxMergeCount=69 ``` Net/net I'm not so sure I trust this stressful-ish benchy now ... I need to study the segment trace some more (also, this reveals some bugs in segment trace generation). Anyway, I will disable InfoStream for tonite's run, just to remove that possible variable -- since the indexing is so degenerate, maybe InfoStream is adding too much cost! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org