On Tue, 9 Jan 2024 at 19:39, Steinar H. Gunderson <se...@debian.org> wrote:

> On Tue, Jan 09, 2024 at 07:34:56PM +0100, Manolis Stamatogiannakis wrote:
> > I also gave it a shot reproducing on a RPi Zero 2, but it's either too
> fast
> > (even with cpulimit), or the issue is architecture-specific and does not
> > manifest on ARMv7.
>
> Can I claim “this is obviously a kernel bug” and pass the buck? :-)
>

Let's cc Linus then. :-D


>
> I mean, if you would be willing to give out access on your machine,
> I could SSH in and debug there, but obviously this is a pretty narrow
> case no matter what the actual issue is.
>

I wish that was straightforward, but this is a NAS box (lots of personal
files) and the OS shares the same disks with the storage array. So it is
not easy to sanitize it well enough for sharing access. Plus, it's
dog-slow, so I'm not sure you would enjoy the process ;-) : It takes around
36sec to recompile a single file and relink the binary with ninja.

But I take this as a good opportunity to learn a bit about io_uring, so
I'll give it a shot myself. From my first experiments, it appears that the
code is deadlocking somewhere in IOUringEngine::finish(). And it looks like
a timing-related bug, as adding a couple of dprintfs and running with
--debug is enough to get things rolling. I'll probably continue debugging
on Friday.

I have temporarily cloned the repo on GH [1] if you have time to check it
out. I'm not sure if this is the right place for discussing code, so maybe
we should switch to GH comments or private emails until there's some
outcome (?).

Best regards,
Manolis

[1] https://github.com/m000/plocate/pull/1

Reply via email to