On 2025-01-09 13:46:47 [-0300], Wander Lairson Costa wrote: > > If the issue is indeed the use of threaded interrupts then the fix > > should not be limited to be PREEMPT_RT only. > > > Although I was not aware of this scenario, the patch should work for it as > well, > as I am forcing it to run in interrupt context. I will test it to confirm.
If I remember correctly there were "ifdef preempt_rt" things in it. > > > > - What causes the failure? I see you reworked into two parts to behave > > > > similar to what happens without threaded interrupts. There is still no > > > > explanation for it. Is there a timing limit or was there another > > > > register operation which removed the mailbox message? > > > > > > > > > > I explained the root cause of the issue in the last commit. Maybe I should > > > have added the explanation to the cover letter as well. Anyway, here is a > > > partial verbatim copy of it: > > > > > > "During testing of SR-IOV, Red Hat QE encountered an issue where the > > > ip link up command intermittently fails for the igbvf interfaces when > > > using the PREEMPT_RT variant. Investigation revealed that > > > e1000_write_posted_mbx returns an error due to the lack of an ACK > > > from e1000_poll_for_ack. > > > > That ACK would have come if it would poll longer? > > > No, the service wouldn't be serviced while polling. Hmm. > > > The underlying issue arises from the fact that IRQs are threaded by > > > default under PREEMPT_RT. While the exact hardware details are not > > > available, it appears that the IRQ handled by igb_msix_other must > > > be processed before e1000_poll_for_ack times out. However, > > > e1000_write_posted_mbx is called with preemption disabled, leading > > > to a scenario where the IRQ is serviced only after the failure of > > > e1000_write_posted_mbx." > > > > Where is this disabled preemption coming from? This should be one of the > > ops.write_posted() calls, right? I've been looking around and don't see > > anything obvious. > > I don't remember if I found the answer by looking at the code or by > looking at the ftrace flags. > I am currently on sick leave with covid. I can check it when I come back. Don't worry, get better first. I'm kind of off myself. I'm not sure if I have the hardware needed to setup so I can look at it… Sebastian
