Kostik Belousov wrote:
First, I set the followup to the right mailing list.

Second, I am really curious what you do. My understanding follows: you
have set up vnode-backed md device (md0a) on sparce file, created ufs2
on it, mounted it with quotas, and run background fsck on that fs. At
the same time, you did rm for the snapshot file created by fsck. Right ?

This is the procedure i followed, while i have quota enabled, it was not set on the test filesystem.

1) dd if=/dev/zero of=/usr/bigfile bs=1024 seek=209715200 count=0
2) mdconfig -a -t vnode -f /usr/bigfile
3) bsdlabel -w md0 auto
4) newfs -U md0a
5) fsck -v /dev/md0a # ^C this after a second or so, this makes the FS dirty
6) mount /dev/md0a /mnt
7) fsck -v -B /dev/md0a

in another window:
8) while true; do ls -al /mnt/.snap;sleep 1;done


Anyway, the problem seems to be not related to neither snapshots nor
quotas. In your trace, process 35 (syncer) tries to sync the vnode
0xc363c414, that is inode 1515 on aacd0s1f, that is used for md0. That
vnode is already locked by process 515 (md0 kthread). Process 515 is
stuck in the wdrain state, waiting for buffers to be flushed. It seems
that there is huge amount of dirty buffers going to be written to md0,
caused by snapshotting the fs. As result, system deadlocks due to md0
hung waiting for buffer' runspace, that is occupied by pending write
requests to md0.

Do -fs@ readers agree with analysis ?

I propose to set TDP_NORUNNINGBUF thread flag for both swap- and file-
backed md threads to prevent such deadlocks. That i/o is already
accounted for in the upper layer. Moreover, that already accounted
requests do not really differ from requests (re)issued by md.

Please, comment.

FYI, -CURRENT passes this test without locking up, so the fix is already there somewhere.

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to