mikemccand commented on issue #15662:
URL: https://github.com/apache/lucene/issues/15662#issuecomment-3852878340

   OK benchy finished and published Feb 4 2026 datapoint.
   
   But, alas, downgrading kernel to the LTS kernel did NOT recover [the massive 
(~10% - 48%) QPS 
drop](https://benchmarks.mikemccandless.com/CountFilteredOrHighHigh.html).
   
   However, I realized the test wasn't the perfect kernel revert:
   
    * [Jan 22 
run](https://benchmarks.mikemccandless.com/2026.01.22.18.03.42.html) (last 
good) was Linux 6.15.2-arch1-1 amd64
    * [Jan 29 
run](https://benchmarks.mikemccandless.com/2026.01.29.14.59.00.html) (first 
bad) was Linux 6.18.7-arch1-1 amd64
    * [Feb 04 run] (last night's run with kernel revert) was Linux 
6.12.67-1-lts #1 SMP PREEMPT_DYNAMIC, which is older than the Jan 22 kernel
   
   Still, I think we can conclude the kernel upgrade wasn't the cause.
   
   I'm going to reboot back to 6.18.7 kernel and revert [recent Lucene changes 
in the Jan 22 - Jan 29 
window](https://github.com/apache/lucene/compare/2f9aa8ae26d6c1087884c734e1b3d137bd8c6601...6a6b753e3080725921a07b0a214963d8ff639eea).
   
   As first attempt (of this manual and slow-to-iterate `git bisect`) I'll roll 
Lucene back to 
https://github.com/apache/lucene/commit/0a699fa4d2db9741c699a01c1b731bbd66613b6a,
 which is just before we landed [the shared (`AtomicInteger`) prefetch 
counter](https://github.com/apache/lucene/pull/15585).  I'm now more worried 
about the dreaded MESI ping-pong situation (where the CPU's cross-core locking 
system ([MESI](https://en.wikipedia.org/wiki/MESI_protocol)) is stressed out by 
all the volatile read/write hiding under the shared `AtomicInteger`)!!  (from 
my comment 
[here](https://github.com/apache/lucene/pull/15585#issuecomment-3820113225) 
where I [asked 
Claude](https://claude.ai/share/a9ddcc42-871c-4959-beac-c7b1a90bae0b)).  
   
   Even though that change (shared `AtomicInteger`) is a win sometimes (from 
@shubhamsrkdev's benchmark results -- thank you for testing & retesting), the 
many (64 real, 128 hyperthread'd) cores in [`beast3` nightly benchmarking 
box](https://blog.mikemccandless.com/2021/01/apache-lucene-performance-on-128-core.html)
 (sheesh, five years ago now -- that equates to 150 year old human!! (one 
computer year maps to 30 human years)) surely increase the cost/risk of the 
cache invalidation ping-ponging.  I'll go reopen this discussion with Claude 
too...
   
   (Or, it could be one of the (many!) other system packages that were upgraded 
-- I'll put full list here).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to