Eddie Horng:
> It seems all tasks are trying to lock sbinfo->si_xib_mtx, but log shows
> nobody is holding it. I also got similar problem in my codefs, but it's
> very rare, not like this case now I can almost always reproduce it. To
> isolate issue, I changed ro branch from codefs to ext4 and still can
> reproduce.
> List reproduce step as below, maybe can give you some hint:
> - The codebase is an android source package
> - make -j128 (found -j32 can also repro)
> - ctrl-c to interrupt build
> - the build system -- ninja will kill all subprocesses
> - In most case a clang++.real process will left running forever with high
> cpu%

It looks a livelock (instead of a deadlock).
Still I cannot see the whole scenario, but I've found a suspicious, ah
no, "related" commit in linux-v4.10-rc7
        5abf186 2017-02-03 mm, fs: check for fatal signals in 
do_generic_file_read()

For aufs, the side effect of this commit is very similar to your
problem. It will cause a livelock in aufs and will happen in reading
from aufs XINO files.

There was a similar commit in linux-v4.3, but it was "write to XINO"
instead of "read from XINO" case.
        296291c 2015-10-23 mm: make sendfile(2) killable
and I fixed it by the commit
        5e439ff 2016-01-05 aufs: for 4.3, XINO handles EINTR from the dying 
process
which made aufs4.3 retries "write to XINO" in another context after
EINTR.

Hmm, if the cause of your problem is the endless loop after EINTR in
"read from XINO", then it should be solved by the same approach which
aufs4.3. It requilres a small conversion "write to / read from" XINO.

The conditions to reproduce the problem should be like these.
- linux-v4.10-rc7 and later
- aufs has many files (over 32K)
- the process is killed by SIGKILL

I've tried, but I could not reproduce the problem in the beginning...
I'd ask you to test this patch. If you can, please try.


J. R. Okajima

Attachment: a.patch.bz2
Description: BZip2 compressed data

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

Reply via email to