Re: kqueue LOR
Kostik Belousov wrote: On Fri, Sep 07, 2007 at 11:49:50AM -0700, Maxim Sobolev wrote: Hi, On my 6.2 system I am seeing LOR discussed almost 1 year ago here: http://lists.freebsd.org/pipermail/freebsd-stable/2006-November/031048.html http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031197.html lock order reversal: 1st 0xc52cb500 kqueue (kqueue) @ kern/kern_event.c:1547 2nd 0xc4e4d80c struct mount mtx (struct mount mtx) @ ufs/ufs/ufs_vnops.c:138 Do you have any plans to commit the suggested fix? I suspect that the LOR is bogus. I was never able to get the information where the reverse lock order happen. What I asked of the most reporters is to apply sys/kern/subr_witness.c rev. 1.222 to RELENG_6 and provide me with the LOR report, if any. What do you mean bogus? It happens reliably on my system. Note that doing that on RELENG_6_2 makes no sense, most likely you will get LORs with cdev mutex, fixed in RELENG_6. I don't quite understand that. -Maxim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
On Sat, Sep 08, 2007 at 12:58:24AM -0700, Maxim Sobolev wrote: Kostik Belousov wrote: On Fri, Sep 07, 2007 at 11:49:50AM -0700, Maxim Sobolev wrote: Hi, On my 6.2 system I am seeing LOR discussed almost 1 year ago here: http://lists.freebsd.org/pipermail/freebsd-stable/2006-November/031048.html http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031197.html lock order reversal: 1st 0xc52cb500 kqueue (kqueue) @ kern/kern_event.c:1547 2nd 0xc4e4d80c struct mount mtx (struct mount mtx) @ ufs/ufs/ufs_vnops.c:138 Do you have any plans to commit the suggested fix? I suspect that the LOR is bogus. I was never able to get the information where the reverse lock order happen. What I asked of the most reporters is to apply sys/kern/subr_witness.c rev. 1.222 to RELENG_6 and provide me with the LOR report, if any. What do you mean bogus? It happens reliably on my system. Bogus means that reported LOR, most likely, do not cause actual deadlock. Note that doing that on RELENG_6_2 makes no sense, most likely you will get LORs with cdev mutex, fixed in RELENG_6. I don't quite understand that. RELENG_6_2 had known LOR that already was fixed in RELENG_6, between cdev mutex and and sleep mtxpool, namely LOR #197. After user have applyed rev. 1.222 of subr_witness, I usuallygot the reports of that LOR, if any, but not the LOR I needed. pgpbplIHuZo3G.pgp Description: PGP signature
kqueue LOR
Hi, On my 6.2 system I am seeing LOR discussed almost 1 year ago here: http://lists.freebsd.org/pipermail/freebsd-stable/2006-November/031048.html http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031197.html lock order reversal: 1st 0xc52cb500 kqueue (kqueue) @ kern/kern_event.c:1547 2nd 0xc4e4d80c struct mount mtx (struct mount mtx) @ ufs/ufs/ufs_vnops.c:138 Do you have any plans to commit the suggested fix? Thanks! -Maxim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
On Fri, Sep 07, 2007 at 11:49:50AM -0700, Maxim Sobolev wrote: Hi, On my 6.2 system I am seeing LOR discussed almost 1 year ago here: http://lists.freebsd.org/pipermail/freebsd-stable/2006-November/031048.html http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031197.html lock order reversal: 1st 0xc52cb500 kqueue (kqueue) @ kern/kern_event.c:1547 2nd 0xc4e4d80c struct mount mtx (struct mount mtx) @ ufs/ufs/ufs_vnops.c:138 Do you have any plans to commit the suggested fix? I suspect that the LOR is bogus. I was never able to get the information where the reverse lock order happen. What I asked of the most reporters is to apply sys/kern/subr_witness.c rev. 1.222 to RELENG_6 and provide me with the LOR report, if any. Note that doing that on RELENG_6_2 makes no sense, most likely you will get LORs with cdev mutex, fixed in RELENG_6. pgpPGMOEP0bmE.pgp Description: PGP signature
Re: kqueue LOR
On 27. nov. 2006, at 10.21, Kostik Belousov wrote: On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c: 1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/ sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write +0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. While debugging a problem I have with 6.2-RELEASE on one of my servers I saw this LOR. After being up for a short while the server freezes, not responding to serial console, network og keyboard. I can't even get to DDB by sending BREAK on the serial console. Enabling INVARIANTS, INVARIANT_SUPPORT, WITNESS and WITNESS_SKIPSPIN did not give more information about the freeze other than printing the LOR now and then. The LOR I am getting is exactly the same except the calls are made to writev instead of write. What application you run that triggers the LOR ? Patch below is one possible approach to fixing it. I am seeing this on a front-end MX server, I can trigger it by running tail -f /var/log/maillog, the LOR is printed before any output is printed by tail. After triggering it once, it will not trigger regularilly until waiting for some time. Waiting 180 seconds seems to be good to make it happen every time, but it can be triggered earlier. My maillog grows about 976K during that time. May this LOR have something to do with the system freze I am experiencing? Should I try the patch in your mail from november 27. or december 13? Or has some other fix emerged since then? -- Frode Nordahl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
On Thu, Feb 08, 2007 at 09:49:23PM +0100, Frode Nordahl wrote: On 27. nov. 2006, at 10.21, Kostik Belousov wrote: On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c: 1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/ sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write +0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. While debugging a problem I have with 6.2-RELEASE on one of my servers I saw this LOR. After being up for a short while the server freezes, not responding to serial console, network og keyboard. I can't even get to DDB by sending BREAK on the serial console. Enabling INVARIANTS, INVARIANT_SUPPORT, WITNESS and WITNESS_SKIPSPIN did not give more information about the freeze other than printing the LOR now and then. The useful way to report deadlock is described in developer handbook, kernel debug chapter, deadlock debugging. If the freeze caused by this LOR, then supplying me with information requested in that chapter could be helpful (both to you and me). pgpsIgstFJaJP.pgp Description: PGP signature
Re: kqueue LOR
On Wed, Dec 13, 2006 at 04:12:57AM +, Tor Egge wrote: Hmm, may be, since vnode must be interlocked by ffs_sync() after MNTK_SUSPENDED set, and before MNTK_SUSPEND, mount interlock is not needed in ufs_itimes. Tor ? If neither IN_CHANGE nor IN_UPDATE is set then it might be unsafe to set IN_MODIFIED in ufs_itimes() since the file system might be suspended or in the process of being suspended with the vnode sync loop in ffs_sync() having iterated past the vnode. This was exactly the reason for IN_LAZYACCESS flag introduction. It is set when fs is suspending or suspended, and no IN_CHANGE or IN_UPDATE flags are set. I don't think the mount interlock is needed to check for MNTK_SUSPEND being set in ufs_itimes() as long as the vnode interlock is held. If a stale value is read without MNTK_SUSPEND set then the vnode sync loop in ffs_sync() cannot have iterated past the vnode, thus it should still be safe to set IN_MODIFIED. All writes by the CPU performing the vnode sync loop before it released the vnode interlock for the same vnode should be visible to the CPU in ufs_itimes() after it has obtained the vnode interlock. I think that you statement is valid for both MNTK_SUSPEND and MNTK_SUSPENDED flags. In other words, aquision of mount interlock could be safely removed from the ufs_itimes(), as was suggested by [EMAIL PROTECTED] Index: ufs/ufs/ufs_vnops.c === RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v retrieving revision 1.283 diff -u -r1.283 ufs_vnops.c --- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 - 1.283 +++ ufs/ufs/ufs_vnops.c 13 Dec 2006 14:10:31 - @@ -133,19 +133,15 @@ { struct inode *ip; struct timespec ts; - int mnt_locked; ip = VTOI(vp); - mnt_locked = 0; - if ((vp-v_mount-mnt_flag MNT_RDONLY) != 0) { - VI_LOCK(vp); + VI_LOCK(vp); + if ((vp-v_mount-mnt_flag MNT_RDONLY) != 0) goto out; + if ((ip-i_flag (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) { + VI_UNLOCK(vp); + return; } - MNT_ILOCK(vp-v_mount); /* For reading of mnt_kern_flags. */ - mnt_locked = 1; - VI_LOCK(vp); - if ((ip-i_flag (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) - goto out_unl; if ((vp-v_type == VBLK || vp-v_type == VCHR) !DOINGSOFTDEP(vp)) ip-i_flag |= IN_LAZYMOD; @@ -172,10 +168,7 @@ out: ip-i_flag = ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); - out_unl: VI_UNLOCK(vp); - if (mnt_locked) - MNT_IUNLOCK(vp-v_mount); } /* [ It seems that MNTK_SUSPENDED flag implies MNTK_SUSPEND. ] pgpMGzCo3rusc.pgp Description: PGP signature
Re: kqueue LOR
On Tuesday 12 December 2006 21:48, Bruce Evans wrote: Memory barriers just specify ordering, they don't ensure a cache flush so another CPU reads up to date values. You can use memory barriers in conjunction with atomic operations on a variable to ensure that you can safely read other variables (which is what locks do). For example, in this I thought that the acquire/release variants of atomic ops guarantee this. They seem to be documented to do this, while mutexes don't seem to be documented to do this. The MI (?) implementation of mutexes depends on atomic_cmpset_{acq,rel}_ptr() doing this. The acq/rel just specify ordering. As Attilio mentioned, we assume that the atomic_cmpset() that sets the contested flag will fail while racing with another CPU (even if the CPU can't see the new value, as long as it fails and keeps spinning mutexes will still work). The 'rel' barrier on CPU A when releasing a lock forces all the other writes to be posted (and eventually become visible) to other CPUs before the write that releases the lock. The 'acq' barrier on CPU B when acquiring the lock forces the CPU to not reorder any reads before it acquires the lock, so this makes you not read any data until you have the lock. Thus, once CPU B has waited long enough to see the write from A to release the lock, we know that 1) it can also see all the other writes from that CPU that the lock protected, and 2) B hasn't tried to read any of them yet so it shouldn't have any stale values in registers. None of this requires the OS to do a cache flush. (If you have an SMP system where the cache can still hold stale values after another CPU updates values in memory where it is visible to the CPU acquiring the lock, then _acq might need to flush the cache, but that would be a property of that architecture. However, even that would not require cache flushes so long as the stale values were evicted from the cache such that they honor the memory barrier and you don't see the new value of the lock until you see the new values of the earlier data.) -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
Kostik Belousov wrote: On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. Is the mount lock really required, if all we're doing is a single read of a single word (mnt_kern_flags) (v_mount should be read-only for the whole lifetime of the vnode, I believe)? After all, reads of a single word are atomic on all our supported architectures. The only situation I see where there MIGHT be problems are forced unmounts, but I think there are bigger issues with those. Sorry for noticing this email only now. -- Suleiman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: Kostik Belousov wrote: On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. Is the mount lock really required, if all we're doing is a single read of a single word (mnt_kern_flags) (v_mount should be read-only for the whole lifetime of the vnode, I believe)? After all, reads of a single word are atomic on all our supported architectures. The only situation I see where there MIGHT be problems are forced unmounts, but I think there are bigger issues with those. Sorry for noticing this email only now. The problem is real with snapshotting. Ignoring MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode inactivation time. This was the big trouble with nfsd and snapshots. As such, I think that precise value of mmnt_kern_flag is critical there, and mount interlock is needed. Practically speaking, I agree with claim that reading of m_k_f is surrounded by enough locked operations that would make sure that the read value is not stale. But there is no such guarantee on future/non-i386 arches, isn't it ? As a side note, mount interlock scope could be reduced there. Index: ufs/ufs/ufs_vnops.c === RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v retrieving revision 1.283 diff -u -r1.283 ufs_vnops.c --- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 - 1.283 +++ ufs/ufs/ufs_vnops.c 12 Dec 2006 10:18:04 - @@ -133,19 +134,19 @@ { struct inode *ip; struct timespec ts; - int mnt_locked; ip = VTOI(vp); - mnt_locked = 0; if ((vp-v_mount-mnt_flag MNT_RDONLY) != 0) { VI_LOCK(vp); goto out; } MNT_ILOCK(vp-v_mount); /* For reading of mnt_kern_flags. */ - mnt_locked = 1; VI_LOCK(vp); - if ((ip-i_flag (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) - goto out_unl; + if ((ip-i_flag (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) { + MNT_IUNLOCK(vp-v_mount); + VI_UNLOCK(vp); + return; + } if ((vp-v_type == VBLK || vp-v_type == VCHR) !DOINGSOFTDEP(vp)) ip-i_flag |= IN_LAZYMOD; @@ -155,6 +156,7 @@ ip-i_flag |= IN_MODIFIED; else if (ip-i_flag IN_ACCESS) ip-i_flag |= IN_LAZYACCESS; + MNT_IUNLOCK(vp-v_mount); vfs_timestamp(ts); if (ip-i_flag IN_ACCESS) { DIP_SET(ip, i_atime, ts.tv_sec); @@ -172,10 +174,7 @@ out: ip-i_flag = ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); - out_unl: VI_UNLOCK(vp); - if (mnt_locked) - MNT_IUNLOCK(vp-v_mount); } /* pgpkA3SWblcGR.pgp Description: PGP signature
Re: kqueue LOR
On Tue, 12 Dec 2006, Kostik Belousov wrote: On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: Is the mount lock really required, if all we're doing is a single read of a single word (mnt_kern_flags) (v_mount should be read-only for the whole lifetime of the vnode, I believe)? After all, reads of a single word are atomic on all our supported architectures. The only situation I see where there MIGHT be problems are forced unmounts, but I think there are bigger issues with those. Sorry for noticing this email only now. The problem is real with snapshotting. Ignoring MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode inactivation time. This was the big trouble with nfsd and snapshots. As such, I think that precise value of mmnt_kern_flag is critical there, and mount interlock is needed. Locking for just read is almost always bogus, but here (as in most cases) there is also a write based on the contents of the flag, and the lock is held across the write. Practically speaking, I agree with claim that reading of m_k_f is surrounded by enough locked operations that would make sure that the read value is not stale. But there is no such guarantee on future/non-i386 arches, isn't it ? I think not-very-staleness is implied by acquire/release semantics which are part of the API for most atomic operations. This behaviour doesn't seem to be documented for mutexes, but I don't see how mutexes could work without it (they have to synchronize all memory accesses, not just the memory accessed by the lock). As a side note, mount interlock scope could be reduced there. Index: ufs/ufs/ufs_vnops.c === RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v retrieving revision 1.283 diff -u -r1.283 ufs_vnops.c --- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 - 1.283 +++ ufs/ufs/ufs_vnops.c 12 Dec 2006 10:18:04 - @@ -133,19 +134,19 @@ { struct inode *ip; struct timespec ts; - int mnt_locked; ip = VTOI(vp); - mnt_locked = 0; if ((vp-v_mount-mnt_flag MNT_RDONLY) != 0) { VI_LOCK(vp); goto out; } MNT_ILOCK(vp-v_mount); /* For reading of mnt_kern_flags. */ - mnt_locked = 1; VI_LOCK(vp); - if ((ip-i_flag (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) - goto out_unl; + if ((ip-i_flag (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) { + MNT_IUNLOCK(vp-v_mount); + VI_UNLOCK(vp); + return; + } The version that depends on not-very-staleness would test the flags without acquiring the lock(s) and return immediately in the usual case where none of the flags are set. It would have to acquire the locks and repeat the test to make changes (and the test is already repeated one flag at a time). I think this would be correct enough, but still inefficient and/or even messier. The current organization is usually: acquire vnode interlock in caller release vnode interlock in caller to avoid messes here (inefficient) call acquire mount interlock acquire vnode interlock test the flags; goto cleanup code if none set (usual case) do the work release vnode interlock release mount interlock return acquire vnode interlock (if needed) release vnode interlock (if needed) and it might become: acquire vnode interlock in caller call test the flags; return if none set (usual case) release vnode interlock // check that callers are aware of this acquire mount interlock acquire vnode interlock do the work // Assume no LOR problem for release, as below. // Otherwise need another relese+acquire of vnode interlock. release mount interlock return release vnode interlock if ((vp-v_type == VBLK || vp-v_type == VCHR) !DOINGSOFTDEP(vp)) ip-i_flag |= IN_LAZYMOD; @@ -155,6 +156,7 @@ ip-i_flag |= IN_MODIFIED; else if (ip-i_flag IN_ACCESS) ip-i_flag |= IN_LAZYACCESS; + MNT_IUNLOCK(vp-v_mount); vfs_timestamp(ts); if (ip-i_flag IN_ACCESS) { DIP_SET(ip, i_atime, ts.tv_sec); Is there no LOR problem for release? As I understand it, MNT_ILOCK() is only protecting IN_ACCESS being converted to IN_MODIFED, so after this conversion is done the lock is not needed. Is this correct? @@ -172,10 +174,7 @@ out: ip-i_flag = ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); - out_unl: VI_UNLOCK(vp); - if (mnt_locked) - MNT_IUNLOCK(vp-v_mount); } /* BTW, vfs.lookup_shared defaults to 0 and decides shared access for all operations including read, so I wonder if there are [m]any bugs preventing shared accesses
Re: kqueue LOR
On Tue, Dec 12, 2006 at 11:49:42PM +1100, Bruce Evans wrote: On Tue, 12 Dec 2006, Kostik Belousov wrote: On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: Is the mount lock really required, if all we're doing is a single read of a single word (mnt_kern_flags) (v_mount should be read-only for the whole lifetime of the vnode, I believe)? After all, reads of a single word are atomic on all our supported architectures. The only situation I see where there MIGHT be problems are forced unmounts, but I think there are bigger issues with those. Sorry for noticing this email only now. The problem is real with snapshotting. Ignoring MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode inactivation time. This was the big trouble with nfsd and snapshots. As such, I think that precise value of mmnt_kern_flag is critical there, and mount interlock is needed. Locking for just read is almost always bogus, but here (as in most cases) there is also a write based on the contents of the flag, and the lock is held across the write. Practically speaking, I agree with claim that reading of m_k_f is surrounded by enough locked operations that would make sure that the read value is not stale. But there is no such guarantee on future/non-i386 arches, isn't it ? I think not-very-staleness is implied by acquire/release semantics which are part of the API for most atomic operations. This behaviour doesn't seem to be documented for mutexes, but I don't see how mutexes could work without it (they have to synchronize all memory accesses, not just the memory accessed by the lock). As a side note, mount interlock scope could be reduced there. Index: ufs/ufs/ufs_vnops.c === RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v retrieving revision 1.283 diff -u -r1.283 ufs_vnops.c --- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 - 1.283 +++ ufs/ufs/ufs_vnops.c 12 Dec 2006 10:18:04 - @@ -133,19 +134,19 @@ { struct inode *ip; struct timespec ts; -int mnt_locked; ip = VTOI(vp); -mnt_locked = 0; if ((vp-v_mount-mnt_flag MNT_RDONLY) != 0) { VI_LOCK(vp); goto out; } MNT_ILOCK(vp-v_mount); /* For reading of mnt_kern_flags. */ -mnt_locked = 1; VI_LOCK(vp); -if ((ip-i_flag (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) -goto out_unl; +if ((ip-i_flag (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) { +MNT_IUNLOCK(vp-v_mount); +VI_UNLOCK(vp); +return; +} The version that depends on not-very-staleness would test the flags without acquiring the lock(s) and return immediately in the usual case where none of the flags are set. It would have to acquire the locks and repeat the test to make changes (and the test is already repeated one flag at a time). I think this would be correct enough, but still inefficient and/or even messier. The current organization is usually: acquire vnode interlock in caller release vnode interlock in caller to avoid messes here (inefficient) call acquire mount interlock acquire vnode interlock test the flags; goto cleanup code if none set (usual case) do the work release vnode interlock release mount interlock return acquire vnode interlock (if needed) release vnode interlock (if needed) and it might become: acquire vnode interlock in caller call test the flags; return if none set (usual case) release vnode interlock // check that callers are aware of this acquire mount interlock acquire vnode interlock do the work // Assume no LOR problem for release, as below. // Otherwise need another relese+acquire of vnode interlock. release mount interlock return release vnode interlock if ((vp-v_type == VBLK || vp-v_type == VCHR) !DOINGSOFTDEP(vp)) ip-i_flag |= IN_LAZYMOD; @@ -155,6 +156,7 @@ ip-i_flag |= IN_MODIFIED; else if (ip-i_flag IN_ACCESS) ip-i_flag |= IN_LAZYACCESS; +MNT_IUNLOCK(vp-v_mount); vfs_timestamp(ts); if (ip-i_flag IN_ACCESS) { DIP_SET(ip, i_atime, ts.tv_sec); Is there no LOR problem for release? AFAIK, no. Release is guaranteed to success without blocking. As I understand it, MNT_ILOCK() is only protecting IN_ACCESS being converted to IN_MODIFED, so after this conversion is done the lock is not needed. Is this correct? The code shall set IN_MODIFIED flag only if MNTK_SUSPEND and MNTK_SUSPENDED are clean. The vfs_write_suspend() that set MNTK_SUSPEND, does VFS_SYNC() after setting the flag. Since VFS_SYNC()
Re: kqueue LOR
2006/12/12, Kostik Belousov [EMAIL PROTECTED]: On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: Kostik Belousov wrote: On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. Is the mount lock really required, if all we're doing is a single read of a single word (mnt_kern_flags) (v_mount should be read-only for the whole lifetime of the vnode, I believe)? After all, reads of a single word are atomic on all our supported architectures. The only situation I see where there MIGHT be problems are forced unmounts, but I think there are bigger issues with those. Sorry for noticing this email only now. The problem is real with snapshotting. Ignoring MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode inactivation time. This was the big trouble with nfsd and snapshots. As such, I think that precise value of mmnt_kern_flag is critical there, and mount interlock is needed. This can be avoided using a memory barrier when setting flags. Even if memory barriers usage is not encouraged, some critical code should really use them replacing a mutex semantic (if that worths it). Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
Attilio Rao wrote: 2006/12/12, Kostik Belousov [EMAIL PROTECTED]: On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: Kostik Belousov wrote: On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. Is the mount lock really required, if all we're doing is a single read of a single word (mnt_kern_flags) (v_mount should be read-only for the whole lifetime of the vnode, I believe)? After all, reads of a single word are atomic on all our supported architectures. The only situation I see where there MIGHT be problems are forced unmounts, but I think there are bigger issues with those. Sorry for noticing this email only now. The problem is real with snapshotting. Ignoring MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode inactivation time. This was the big trouble with nfsd and snapshots. As such, I think that precise value of mmnt_kern_flag is critical there, and mount interlock is needed. This can be avoided using a memory barrier when setting flags. Even if memory barriers usage is not encouraged, some critical code should really use them replacing a mutex semantic (if that worths it). Why is memory barrier usage not encouraged? As you said, they can be used to reduce the number of atomic (LOCKed) operations, in some cases. FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier), mb() (load/store mem barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers only needed on SMP), and barrier() (GCC barrier (__asm __volatile (:::memory)) macros that I've personally found very useful. Admittedly, they are harder to use than atomic operations, but it might still worth having something similar. -- Suleiman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
2006/12/12, Suleiman Souhlal [EMAIL PROTECTED]: Attilio Rao wrote: 2006/12/12, Kostik Belousov [EMAIL PROTECTED]: On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: Kostik Belousov wrote: On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. Is the mount lock really required, if all we're doing is a single read of a single word (mnt_kern_flags) (v_mount should be read-only for the whole lifetime of the vnode, I believe)? After all, reads of a single word are atomic on all our supported architectures. The only situation I see where there MIGHT be problems are forced unmounts, but I think there are bigger issues with those. Sorry for noticing this email only now. The problem is real with snapshotting. Ignoring MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode inactivation time. This was the big trouble with nfsd and snapshots. As such, I think that precise value of mmnt_kern_flag is critical there, and mount interlock is needed. This can be avoided using a memory barrier when setting flags. Even if memory barriers usage is not encouraged, some critical code should really use them replacing a mutex semantic (if that worths it). Why is memory barrier usage not encouraged? As you said, they can be used to reduce the number of atomic (LOCKed) operations, in some cases. Beacause they can lead to errors as it is not so straightforward to understand when a memory barrier is needed more than an atomic instruction and so on (even if it doesn't value, for example, for ia32, for other architectures memory barriers could be more expensive than the atomic instruction, without counting a possible error). FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier), mb() (load/store mem barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers only needed on SMP), and barrier() (GCC barrier (__asm __volatile (:::memory)) macros that I've personally found very useful. I think that our memory barriers reflect the usage we do into the kernel as the base for building syncronizing primitives. From this point of view our atomic operations (meant into the wider possible sense, man 9 atomic) are more suitable than having something like Linux's smp_*(). Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
On Tuesday 12 December 2006 13:43, Suleiman Souhlal wrote: Attilio Rao wrote: 2006/12/12, Kostik Belousov [EMAIL PROTECTED]: On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: Kostik Belousov wrote: On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. Is the mount lock really required, if all we're doing is a single read of a single word (mnt_kern_flags) (v_mount should be read-only for the whole lifetime of the vnode, I believe)? After all, reads of a single word are atomic on all our supported architectures. The only situation I see where there MIGHT be problems are forced unmounts, but I think there are bigger issues with those. Sorry for noticing this email only now. The problem is real with snapshotting. Ignoring MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode inactivation time. This was the big trouble with nfsd and snapshots. As such, I think that precise value of mmnt_kern_flag is critical there, and mount interlock is needed. This can be avoided using a memory barrier when setting flags. Even if memory barriers usage is not encouraged, some critical code should really use them replacing a mutex semantic (if that worths it). Why is memory barrier usage not encouraged? As you said, they can be used to reduce the number of atomic (LOCKed) operations, in some cases. FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier), mb() (load/store mem barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers only needed on SMP), and barrier() (GCC barrier (__asm __volatile (:::memory)) macros that I've personally found very useful. Admittedly, they are harder to use than atomic operations, but it might still worth having something similar. Memory barriers just specify ordering, they don't ensure a cache flush so another CPU reads up to date values. You can use memory barriers in conjunction with atomic operations on a variable to ensure that you can safely read other variables (which is what locks do). For example, in this case IIUC, you have a race that is because there is shared state between two fields, one in the mount structure, and one in the ufs i-node. Memory barriers alone won't prevent you from operating on those flags non-consistently. That is, you have two memory locations in play here, and atomic ops only work on a single one. There isn't an atomic op to do read from memory location A, check flag B, and if it's true write C to memory location D. Where would you put the membar in this case to ensure that the action always results in consistent behavior? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
On Tue, 12 Dec 2006, John Baldwin wrote: On Tuesday 12 December 2006 13:43, Suleiman Souhlal wrote: Why is memory barrier usage not encouraged? As you said, they can be used to reduce the number of atomic (LOCKed) operations, in some cases. ... Admittedly, they are harder to use than atomic operations, but it might still worth having something similar. How would MI code know when using memory barriers is good? This is already hard to know for atomic ops -- if there would more than a couple of atomic ops then it is probably better to use 1 mutex lock/unlock and no atomic ops, since this reduces the total number of atomic ops in most cases, but it is hard for MI code to know how many a couple is. (This also depends on the SMP option -- without SMP, locking is automatic so atomic ops are very fast but mutexes are still slow since they do a lot more than an atomic op.) Memory barriers just specify ordering, they don't ensure a cache flush so another CPU reads up to date values. You can use memory barriers in conjunction with atomic operations on a variable to ensure that you can safely read other variables (which is what locks do). For example, in this I thought that the acquire/release variants of atomic ops guarantee this. They seem to be documented to do this, while mutexes don't seem to be documented to do this. The MI (?) implementation of mutexes depends on atomic_cmpset_{acq,rel}_ptr() doing this. Bruc ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
Hmm, may be, since vnode must be interlocked by ffs_sync() after MNTK_SUSPENDED set, and before MNTK_SUSPEND, mount interlock is not needed in ufs_itimes. Tor ? If neither IN_CHANGE nor IN_UPDATE is set then it might be unsafe to set IN_MODIFIED in ufs_itimes() since the file system might be suspended or in the process of being suspended with the vnode sync loop in ffs_sync() having iterated past the vnode. I don't think the mount interlock is needed to check for MNTK_SUSPEND being set in ufs_itimes() as long as the vnode interlock is held. If a stale value is read without MNTK_SUSPEND set then the vnode sync loop in ffs_sync() cannot have iterated past the vnode, thus it should still be safe to set IN_MODIFIED. All writes by the CPU performing the vnode sync loop before it released the vnode interlock for the same vnode should be visible to the CPU in ufs_itimes() after it has obtained the vnode interlock. - Tor Egge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kqueue LOR
On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. What application you run that triggers the LOR ? Patch below is one possible approach to fixing it. Index: kern/vnode_if.src === RCS file: /usr/local/arch/ncvs/src/sys/kern/vnode_if.src,v retrieving revision 1.84 diff -u -r1.84 vnode_if.src --- kern/vnode_if.src 13 Nov 2006 05:51:22 - 1.84 +++ kern/vnode_if.src 26 Nov 2006 15:20:44 - @@ -164,6 +164,16 @@ }; +%% getattrfast vp L L L + +vop_getattrfast { + IN struct vnode *vp; + OUT struct vattr *vap; + IN struct ucred *cred; + IN struct thread *td; +}; + + %% setattr vp E E E %! setattr postvop_setattr_post Index: kern/vfs_default.c === RCS file: /usr/local/arch/ncvs/src/sys/kern/vfs_default.c,v retrieving revision 1.135 diff -u -r1.135 vfs_default.c --- kern/vfs_default.c 13 Nov 2006 05:51:22 - 1.135 +++ kern/vfs_default.c 26 Nov 2006 15:20:44 - @@ -62,6 +62,7 @@ static int vop_nolookup(struct vop_lookup_args *); static int vop_nostrategy(struct vop_strategy_args *); +static int vop_stdgetattrfast(struct vop_getattrfast_args *); /* * This vnode table stores what we want to do if the filesystem doesn't @@ -96,6 +97,7 @@ .vop_revoke = VOP_PANIC, .vop_strategy = vop_nostrategy, .vop_unlock = vop_stdunlock, + .vop_getattrfast = vop_stdgetattrfast, }; /* @@ -511,6 +513,19 @@ ap-a_sync, ap-a_rtvals); } +static int +vop_stdgetattrfast(ap) + struct vop_getattrfast_args /* { + struct vnode *vp; + struct vattr *vap; + struct ucred *cred; + struct thread *td; + } */ *ap; +{ + + return VOP_GETATTR(ap-a_vp, ap-a_vap, ap-a_cred, ap-a_td); +} + /* * vfs default ops * used to fill the vfs function table to get reasonable default return values. Index: kern/vfs_subr.c === RCS file: /usr/local/arch/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.692 diff -u -r1.692 vfs_subr.c --- kern/vfs_subr.c 13 Nov 2006 05:51:22 - 1.692 +++ kern/vfs_subr.c 26 Nov 2006 15:20:44 - @@ -3828,7 +3828,7 @@ return (1); } - if (VOP_GETATTR(vp, va, curthread-td_ucred, curthread)) + if (VOP_GETATTRFAST(vp, va, curthread-td_ucred, curthread)) return (0); kn-kn_data = va.va_size - kn-kn_fp-f_offset; Index: ufs/ufs/ufs_vnops.c === RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v retrieving revision 1.283 diff -u -r1.283 ufs_vnops.c --- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 - 1.283 +++ ufs/ufs/ufs_vnops.c 26 Nov 2006 15:20:44 - @@ -97,6 +97,7 @@ static vop_close_t ufs_close; static vop_create_tufs_create; static vop_getattr_t ufs_getattr; +static vop_getattrfast_t ufs_getattrfast; static vop_link_t ufs_link; static int ufs_makeinode(int mode, struct vnode *, struct vnode **, struct componentname *); static vop_mkdir_t ufs_mkdir;
Re: kqueue LOR
Kostik Belousov wrote: On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: [...] Thank you for the report. The LOR is caused by my commit into sys/ufs/ufs/ufs_vnops.c, rev. 1.280. What application you run that triggers the LOR ? Patch below is one I have no idea, I just found it in daily summary logs. possible approach to fixing it. [...] -- VH signature.asc Description: OpenPGP digital signature
kqueue LOR
Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. -- VH +lock order reversal: + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138 +KDB: stack backtrace: +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at kdb_backtrace+0x2f +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at witness_checkorder+0x5fe +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at ufs_getattr+0x20 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at VOP_GETATTR_APV+0x3a +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at VOP_WRITE_APV+0x148 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 +Xint0x80_syscall() at Xint0x80_syscall+0x1f +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp = 0xbfbfea48 --- signature.asc Description: OpenPGP digital signature
Re: kqueue LOR
On Sun, 26 Nov 2006, Václav Haisman wrote: Hi, the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1 with relatively recent kernel, from last week or so. lock order reversal: 1st 0xc537f300 kqueue (kqueue) @ sys/kern/kern_event.c:1547 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ sys/ufs/ufs/ufs_vnops.c:138 Added to The LOR page with LOR ID # 193: http://sources.zabbadoz.net/freebsd/lor.html#193 -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]