> Thanks Kris, these are exactly the clues I needed. Since the deadlock > during a snapshot is fairly easy to reproduce, I did so and collected this > information below. "alltrace" didn't work as I expected (didn't produce a > trace), so I traced each pid associated with a locked vnode separately.
The vnode syncing loop in ffs_sync() has some problems: 1. Softupdate processing performed after the loop has started might trigger the need for retrying the loop. Processing of dirrem work items can cause IN_CHANGE to be set on some inodes, causing deadlock in ufs_inactive() later on while the file system is suspended). 2. nvp might no longer be associated with the same mount point after MNT_IUNLOCK(mp) has been called in the loop. This can cause the vnode list traversal to be incomplete, with stale information in the snapshot. Further damage can occur when background fsck uses that stale information. Just a few lines down from that loop is a new problem: 3. softdep_flushworklist() might not have processed all dirrem work items associated with the file system even if both error and count are zero. This can cause both background fsck and softupdate processing (after file system has been resumed) to decrement the link count of an inode, causing file system corruption or a panic. Processing of these work items while the file system is suspended causes a panic. - Tor Egge _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"