mikemccand commented on issue #15662: URL: https://github.com/apache/lucene/issues/15662#issuecomment-3946125768
`beast3` benchmarking lives!!! Yesterday's run finally succeeded again end-to-end benchmarks, on downgraded packages, downgraded JDK (25.0.1), recent Lucene sources ([only 77 Lucene changes](https://github.com/apache/lucene/compare/2f9aa8ae26d6c1087884c734e1b3d137bd8c6601...338a79181f0347ce7ba39e0210341c38afbfdbe9) since previous successful benchy run, heh). The results are not yet trustworthy -- I have `FCLK` mis-configured on the current `beast3` boot -- I'll fix that, re-update box to latest arch linux, and get benchy running again each night, and pray that in those 77 Lucene changes, or arch linux package changes, there is not another regression. The smoking gun was the [CPU governor](https://wiki.archlinux.org/title/CPU_frequency_scaling) mixed with too-old bios! Somehow the governor switched somewhere in that Jan 22 - 29 window, but then the driver (that actually interacts w/ the CPU cores to read/write targets/limits) `amd_pstate` was unable to interact with the too-old BIOS -- all errors trying to query each CPU's capabilities -- so it fell back to godawful slow safe defaults. Annoyingly that CPU governor change stuck even with attempted whole system downgrades. Claude was great fun in iterating theories, testing them, teaching me all sorts of wild Linux tooling to inspect every last detail about your hardware ([`turbostat`](https://archlinux.org/packages/?name=turbostat), [`cpupower`](https://archlinux.org/packages/?name=cpupower), `/sys/devices/*`, [`mcelog`](https://mcelog.org/), [`decode-dimms`](https://man.archlinux.org/man/decode-dimms.1.en), [`numactl`](https://man.archlinux.org/man/numactl.8.en), [`dmidecode`](https://man.archlinux.org/man/dmidecode.8.en), [`htop`](https://man.archlinux.org/man/htop.1.en), [`btop`](https://github.com/aristocratos/btop), [`s-tui`](https://github.com/amanusk/s-tui) (<-- phew this was able to get all 128 cores maxed out!! oh the amps of DC going into the CPU... sheesh. nothing seemed to melt.), [`nvme`](https://man.archlinux.org/man/nvme.1), ...). Claude does a pretty good job understanding photos -- so I would boot to BIOS, take pictures for Claude, Claude would tell me which setting to fix / dive into next / where. I took pictures of my hardware and it told me which components they were, e.g. the pump for the water cooler, the open case/frame. See the blow-by-blow with Claude: [here](https://claude.ai/share/dae1030c-0ecb-491b-8166-f391334ffec9), [here](https://claude.ai/share/dba21376-c29e-4526-a597-7a4ba9d1e5d3), [here](https://claude.ai/share/0f96e528-bf2e-4fdd-a096-c3575dcd94ca), [here](https://claude.ai/share/01461d59-34c8-4b42-96ec-6ffdea27b6d2), [here](https://claude.ai/share/4eb4b79f-e54a-480c-9942-4e338290c915) (sheesh there are more, I'll stop). I made these changes: * Upgraded to modern BIOS, `amd_pstate_epp` is able to talk to CPU cores now * Governor is now wired to performance, boost is enabled/active * I turned on all fans to max (there was a handy switch on the motherboard). Thermal throttling was never happening (not logged anyways), but some temps were hot, so ... also added an external house fan for good measure * Discovered, insanely, that I failed to pull the plastic off the thermal-paste inside the motherboard's cover for the NVMe drives, sheesh. It didn't cause problems (no thermal throttling) but made the NMVe ssds run hot (though they are not holding the index -- that's the Intel Optane PCIe card) * Also discovered I had not plugged in additional power for PCIe -- it's likely that doesn't matter -- the two extra power motherboard plugs for CPU power are plugged in. Still, Claude thinks it's possible my power supply is under-spec'd ... I'll swap in an upgrade and see if it moves the needle ... unlikely I plan to add additional logging to benchy's nightly artifacts to monitor CPU freqs / turbo too, and add more health metrics for statuscake to help me watch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
