On Wed, 2026-06-17 at 13:10 +0000, Karthikeyan KS wrote:
> This looks like a lot of heavily LLM-assisted effort. Please review the
> relevant documentation, starting here:
>    https://docs.kernel.org/process/submitting-patches.html#using-assisted-by
> 
> ==> I partly agree. The code and bug analysis are done manually.
> LLM use was the out of tree test harness and lightly polishing
> the commit message. None of the submitted code is generated.
> If you'd prefer, I can reword the changelog in my own words or
> add an Assisted-by tag ?
> 

Thanks for the clarification. It's probably okay as-is in that case,
but that was unclear previously.

> I feel the testing strategy is pretty questionable. Any invariant
> violation is possible with that type of meddling.
> 
> ==> The underlying bug is a kfifo SPSC contract violation. My intent with the
> test wasn't to simulate the race itself, but to reconstruct the post race 
> state
> specifically where (in - out) exceeds the buffer size and show it causes a
> usercopy overflow in the unpatched code, handled safely after the fix.
> 
> ==> I take your point that forcing that state can itself produce violations 
> that
> wouldn't occur naturally. Since the bug is provable from the source but hard 
> to
> trigger on demand, I'd rather ask what validation you'd accept here?

I'm aiming to build confidence that the change has been tested in
practice beyond spherical-cow circumstances. Isolating the conditions
this way seems okay, but I'd class the testing approach as necessary-
but-not-sufficient. It's important that the change is tested under
typical conditions to build confidence against regressions, as well as
atypical conditions.

> 
> I was interested in whether you drove the interrupt sequence via
> emulated hardware. I asked because upstream qemu doesn't currently
> support the snoop device.
> 
> ==> My apologies for the confusion, I mixed up things. I have not driven the
> interrupt sequence in emulation; as you noted, upstream QEMU doesn't model the
> snoop device. I've described the actual hardware context below.
> 
> In v3 you said:
>    The issue was observed on physical AST2600 (dual-core Cortex-A7)
>    in production under heavy POST code traffic during concurrent
>    userspace reads.
>    
> https://lore.kernel.org/all/[email protected]/
> Is this true? What platform did you test with?
> 
> ==> Yes, the underlying failure is real. It was observed on an AST2600-based
> production BMC running a vendor BSP kernel under continuous host reboot
> cycles. Because that platform can't currently be brought up on pure
> mainline without substantial out-of-tree board support, I have not run
> this exact mainline patch on the physical silicon, observed under the
> BSP kernel, not yet verified as the mainline patch. I should have made
> that distinction clear earlier in the thread.

Can you confirm you you have tested on hardware a backport of this
patch to your BSP kernel?

> 
> ==> If there's a way you'd consider valid for validating a fix like this
> without a full mainline bring up on the SoC, such as a targeted kfifo unit
> test, or something else you'd accept.I'd appreciate the pointer and I'll
> do that.

No, I believe the change is fine, but the claim of testing under qemu
when qemu doesn't model the necessary hardware was a red flag that
needed to be addressed, doubly so in the absence of your track record
of upstream work.

Thanks,

Andrew

Reply via email to