On Mon 30-10-23 15:08:56, Mikulas Patocka wrote:
> On Mon, 30 Oct 2023, Marek Marczykowski-Górecki wrote:
>
> > > Well, it would be possible that larger pages in a bio would trip e.g. bio
> > > splitting due to maximum segment size the disk supports (which can be e.g.
> > > 0xffff) and that upsets something somewhere. But this is pure
> > > speculation. We definitely need more debug data to be able to tell more.
> >
> > I can collect more info, but I need some guidance how :) Some patch
> > adding extra debug messages?
> > Note I collect those via serial console (writing to disk doesn't work
> > when it freezes), and that has some limits in the amount of data I can
> > extract especially when printed quickly. For example sysrq-t is too much.
> > Or maybe there is some trick to it, like increasing log_bug_len?
>
> If you can do more tests, I would suggest this:
>
> We already know that it works with order 3 and doesn't work with order 4.
>
> So, in the file include/linux/mmzone.h, change PAGE_ALLOC_COSTLY_ORDER
> from 3 to 4 and in the file drivers/md/dm-crypt.c leave "unsigned int
> order = PAGE_ALLOC_COSTLY_ORDER" there.
>
> Does it deadlock or not?
>
> So, that we can see whether the deadlock depends on
> PAGE_ALLOC_COSTLY_ORDER or whether it is just a coincidence.
Good idea. Also if the kernel hangs, please find kcryptd processes. In what
state are they? If they are sleeping, please send what's in
/proc/<kcryptd-pid>/stack. Thanks!
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR