mikemccand commented on issue #15662: URL: https://github.com/apache/lucene/issues/15662#issuecomment-3852878340
OK benchy finished and published Feb 4 2026 datapoint. But, alas, downgrading kernel to the LTS kernel did NOT recover [the massive (~10% - 48%) QPS drop](https://benchmarks.mikemccandless.com/CountFilteredOrHighHigh.html). However, I realized the test wasn't the perfect kernel revert: * [Jan 22 run](https://benchmarks.mikemccandless.com/2026.01.22.18.03.42.html) (last good) was Linux 6.15.2-arch1-1 amd64 * [Jan 29 run](https://benchmarks.mikemccandless.com/2026.01.29.14.59.00.html) (first bad) was Linux 6.18.7-arch1-1 amd64 * [Feb 04 run] (last night's run with kernel revert) was Linux 6.12.67-1-lts #1 SMP PREEMPT_DYNAMIC, which is older than the Jan 22 kernel Still, I think we can conclude the kernel upgrade wasn't the cause. I'm going to reboot back to 6.18.7 kernel and revert [recent Lucene changes in the Jan 22 - Jan 29 window](https://github.com/apache/lucene/compare/2f9aa8ae26d6c1087884c734e1b3d137bd8c6601...6a6b753e3080725921a07b0a214963d8ff639eea). As first attempt (of this manual and slow-to-iterate `git bisect`) I'll roll Lucene back to https://github.com/apache/lucene/commit/0a699fa4d2db9741c699a01c1b731bbd66613b6a, which is just before we landed [the shared (`AtomicInteger`) prefetch counter](https://github.com/apache/lucene/pull/15585). I'm now more worried about the dreaded MESI ping-pong situation (where the CPU's cross-core locking system ([MESI](https://en.wikipedia.org/wiki/MESI_protocol)) is stressed out by all the volatile read/write hiding under the shared `AtomicInteger`)!! (from my comment [here](https://github.com/apache/lucene/pull/15585#issuecomment-3820113225) where I [asked Claude](https://claude.ai/share/a9ddcc42-871c-4959-beac-c7b1a90bae0b)). Even though that change (shared `AtomicInteger`) is a win sometimes (from @shubhamsrkdev's benchmark results -- thank you for testing & retesting), the many (64 real, 128 hyperthread'd) cores in [`beast3` nightly benchmarking box](https://blog.mikemccandless.com/2021/01/apache-lucene-performance-on-128-core.html) (sheesh, five years ago now -- that equates to 150 year old human!! (one computer year maps to 30 human years)) surely increase the cost/risk of the cache invalidation ping-ponging. I'll go reopen this discussion with Claude too... (Or, it could be one of the (many!) other system packages that were upgraded -- I'll put full list here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
