On Wed, 2026-06-17 at 13:10 +0000, Karthikeyan KS wrote: > This looks like a lot of heavily LLM-assisted effort. Please review the > relevant documentation, starting here: > https://docs.kernel.org/process/submitting-patches.html#using-assisted-by > > ==> I partly agree. The code and bug analysis are done manually. > LLM use was the out of tree test harness and lightly polishing > the commit message. None of the submitted code is generated. > If you'd prefer, I can reword the changelog in my own words or > add an Assisted-by tag ? >
Thanks for the clarification. It's probably okay as-is in that case, but that was unclear previously. > I feel the testing strategy is pretty questionable. Any invariant > violation is possible with that type of meddling. > > ==> The underlying bug is a kfifo SPSC contract violation. My intent with the > test wasn't to simulate the race itself, but to reconstruct the post race > state > specifically where (in - out) exceeds the buffer size and show it causes a > usercopy overflow in the unpatched code, handled safely after the fix. > > ==> I take your point that forcing that state can itself produce violations > that > wouldn't occur naturally. Since the bug is provable from the source but hard > to > trigger on demand, I'd rather ask what validation you'd accept here? I'm aiming to build confidence that the change has been tested in practice beyond spherical-cow circumstances. Isolating the conditions this way seems okay, but I'd class the testing approach as necessary- but-not-sufficient. It's important that the change is tested under typical conditions to build confidence against regressions, as well as atypical conditions. > > I was interested in whether you drove the interrupt sequence via > emulated hardware. I asked because upstream qemu doesn't currently > support the snoop device. > > ==> My apologies for the confusion, I mixed up things. I have not driven the > interrupt sequence in emulation; as you noted, upstream QEMU doesn't model the > snoop device. I've described the actual hardware context below. > > In v3 you said: > The issue was observed on physical AST2600 (dual-core Cortex-A7) > in production under heavy POST code traffic during concurrent > userspace reads. > > https://lore.kernel.org/all/[email protected]/ > Is this true? What platform did you test with? > > ==> Yes, the underlying failure is real. It was observed on an AST2600-based > production BMC running a vendor BSP kernel under continuous host reboot > cycles. Because that platform can't currently be brought up on pure > mainline without substantial out-of-tree board support, I have not run > this exact mainline patch on the physical silicon, observed under the > BSP kernel, not yet verified as the mainline patch. I should have made > that distinction clear earlier in the thread. Can you confirm you you have tested on hardware a backport of this patch to your BSP kernel? > > ==> If there's a way you'd consider valid for validating a fix like this > without a full mainline bring up on the SoC, such as a targeted kfifo unit > test, or something else you'd accept.I'd appreciate the pointer and I'll > do that. No, I believe the change is fine, but the claim of testing under qemu when qemu doesn't model the necessary hardware was a red flag that needed to be addressed, doubly so in the absence of your track record of upstream work. Thanks, Andrew

