On Thu, Jan 12, 2017 at 10:37:18PM +0000, Al Viro wrote: > On Thu, Jan 12, 2017 at 02:26:42PM -0800, Linus Torvalds wrote: > > On Thu, Jan 12, 2017 at 12:26 PM, Alan J. Wylie <a...@wylie.me.uk> wrote: > > > > > > Strace shows that the processes are hanging in write() and read() calls. > > > > If this is splice-related, I'm assuming that they aren't actually the > > two ends of the same pipe, and there is somebody doing splice in the > > middle. > > > > I'm not seeing that process. I'm assuming it's systemd. Can you try > > to find it and strace that one too? Because that middle man is likely > > the one that has problems (and is not able to splice from one pipe to > > the other). > > > > Ugh. That one commit has had a lot of bugs in it already. We do not > > have good splice test coverage, because almost nobody uses it. > > FWIW, I would really like to know what kind of files had been involved. > There are two paths that can lead to default_file_splice_read(): > splice_direct_to_actor() -> do_splice_to() -> default_file_splice_read() and > do_splice() -> do_splice_to() -> default_file_splice_read(). > > The former only gets there for regular files and block devices. The latter > is guaranteed that file is not a pipe. So > * not a socket (have ->splice_read() of their own) > * not a pipe or FIFO (neither path allows those) > * not a block device (have ->splice_read() of their own) > * not a regular file on a normal local fs (ditto) > > So what is it called for in that reproducer?
PS: what about the /proc/mounts contents? If it's something 9p-backed kvm, your bisect might have been caught on the bug I'd mentioned - if the breakage you are seeing in 4.9.3 has started after that commit and before the backport of the fix, your bisect could converge there. Does the reproducer trigger on 523ac9afc73a + cherry-pick of 8e54cadab447?