Re: kqueue LOR

2007-09-08 Thread Maxim Sobolev

Kostik Belousov wrote:

On Fri, Sep 07, 2007 at 11:49:50AM -0700, Maxim Sobolev wrote:

Hi,

On my 6.2 system I am seeing LOR discussed almost 1 year ago here:

http://lists.freebsd.org/pipermail/freebsd-stable/2006-November/031048.html
http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031197.html

lock order reversal:
 1st 0xc52cb500 kqueue (kqueue) @ kern/kern_event.c:1547
 2nd 0xc4e4d80c struct mount mtx (struct mount mtx) @ 
ufs/ufs/ufs_vnops.c:138


Do you have any plans to commit the suggested fix?

I suspect that the LOR is bogus. I was never able to get the information
where the reverse lock order happen. What I asked of the most reporters is
to apply sys/kern/subr_witness.c rev. 1.222 to RELENG_6 and provide me
with the LOR report, if any.


What do you mean bogus? It happens reliably on my system.


Note that doing that on RELENG_6_2 makes no sense, most likely you will
get LORs with cdev mutex, fixed in RELENG_6.


I don't quite understand that.

-Maxim
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2007-09-08 Thread Kostik Belousov
On Sat, Sep 08, 2007 at 12:58:24AM -0700, Maxim Sobolev wrote:
 Kostik Belousov wrote:
 On Fri, Sep 07, 2007 at 11:49:50AM -0700, Maxim Sobolev wrote:
 Hi,
 
 On my 6.2 system I am seeing LOR discussed almost 1 year ago here:
 
 http://lists.freebsd.org/pipermail/freebsd-stable/2006-November/031048.html
 http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031197.html
 
 lock order reversal:
  1st 0xc52cb500 kqueue (kqueue) @ kern/kern_event.c:1547
  2nd 0xc4e4d80c struct mount mtx (struct mount mtx) @ 
 ufs/ufs/ufs_vnops.c:138
 
 Do you have any plans to commit the suggested fix?
 I suspect that the LOR is bogus. I was never able to get the information
 where the reverse lock order happen. What I asked of the most reporters is
 to apply sys/kern/subr_witness.c rev. 1.222 to RELENG_6 and provide me
 with the LOR report, if any.
 
 What do you mean bogus? It happens reliably on my system.
Bogus means that reported LOR, most likely, do not cause actual deadlock.

 
 Note that doing that on RELENG_6_2 makes no sense, most likely you will
 get LORs with cdev mutex, fixed in RELENG_6.
 
 I don't quite understand that.

RELENG_6_2 had known LOR that already was fixed in RELENG_6, between
cdev mutex and and sleep mtxpool, namely LOR #197. After user have
applyed rev. 1.222 of subr_witness, I usuallygot the reports of that
LOR, if any, but not the LOR I needed.


pgpbplIHuZo3G.pgp
Description: PGP signature


kqueue LOR

2007-09-07 Thread Maxim Sobolev

Hi,

On my 6.2 system I am seeing LOR discussed almost 1 year ago here:

http://lists.freebsd.org/pipermail/freebsd-stable/2006-November/031048.html
http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031197.html

lock order reversal:
 1st 0xc52cb500 kqueue (kqueue) @ kern/kern_event.c:1547
 2nd 0xc4e4d80c struct mount mtx (struct mount mtx) @ 
ufs/ufs/ufs_vnops.c:138


Do you have any plans to commit the suggested fix?

Thanks!

-Maxim
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2007-09-07 Thread Kostik Belousov
On Fri, Sep 07, 2007 at 11:49:50AM -0700, Maxim Sobolev wrote:
 Hi,
 
 On my 6.2 system I am seeing LOR discussed almost 1 year ago here:
 
 http://lists.freebsd.org/pipermail/freebsd-stable/2006-November/031048.html
 http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031197.html
 
 lock order reversal:
  1st 0xc52cb500 kqueue (kqueue) @ kern/kern_event.c:1547
  2nd 0xc4e4d80c struct mount mtx (struct mount mtx) @ 
 ufs/ufs/ufs_vnops.c:138
 
 Do you have any plans to commit the suggested fix?
I suspect that the LOR is bogus. I was never able to get the information
where the reverse lock order happen. What I asked of the most reporters is
to apply sys/kern/subr_witness.c rev. 1.222 to RELENG_6 and provide me
with the LOR report, if any.

Note that doing that on RELENG_6_2 makes no sense, most likely you will
get LORs with cdev mutex, fixed in RELENG_6.


pgpPGMOEP0bmE.pgp
Description: PGP signature


Re: kqueue LOR

2007-02-08 Thread Frode Nordahl

On 27. nov. 2006, at 10.21, Kostik Belousov wrote:


On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:

Hi,
the attached lor.txt contains LOR I got this yesterday. It is  
FreeBSD 6.1

with relatively recent kernel, from last week or so.

--
VH



+lock order reversal:
+ 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c: 
1547
+ 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/ 
sys/ufs/ufs/ufs_vnops.c:138

+KDB: stack backtrace:
+kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...)  
at kdb_backtrace+0x2f
+witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at  
witness_checkorder+0x5fe
+_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at  
_mtx_lock_flags+0x32
+ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at  
ufs_itimes+0x6c
+ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at  
ufs_getattr+0x20
+VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...)  
at VOP_GETATTR_APV+0x3a

+filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
+knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
+VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at  
VOP_WRITE_APV+0x148
+vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write 
+0x201
+dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at  
dofilewrite+0x84

+kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
+write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
+syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
+Xint0x80_syscall() at Xint0x80_syscall+0x1f
+--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp =  
0xbfbfea1c, ebp = 0xbfbfea48 ---


Thank you for the report. The LOR is caused by my commit into
sys/ufs/ufs/ufs_vnops.c, rev. 1.280.


While debugging a problem I have with 6.2-RELEASE on one of my  
servers I saw this LOR. After being up for a short while the server  
freezes, not responding to serial console, network og keyboard. I  
can't even get to DDB by sending BREAK on the serial console.


Enabling INVARIANTS, INVARIANT_SUPPORT, WITNESS and WITNESS_SKIPSPIN  
did not give more information about the freeze other than printing  
the LOR now and then.


The LOR I am getting is exactly the same except the calls are made to  
writev instead of write.



What application you run that triggers the LOR ? Patch below is one
possible approach to fixing it.


I am seeing this on a front-end MX server, I can trigger it by  
running tail -f /var/log/maillog, the LOR is printed before any  
output is printed by tail. After triggering it once, it will not  
trigger regularilly until waiting for some time.


Waiting 180 seconds seems to be good to make it happen every time,  
but it can be triggered earlier. My maillog grows about 976K during  
that time.


May this LOR have something to do with the system freze I am  
experiencing? Should I try the patch in your mail from november 27.  
or december 13? Or has some other fix emerged since then?


--
Frode Nordahl



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2007-02-08 Thread Kostik Belousov
On Thu, Feb 08, 2007 at 09:49:23PM +0100, Frode Nordahl wrote:
 On 27. nov. 2006, at 10.21, Kostik Belousov wrote:
 
 On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:
 Hi,
 the attached lor.txt contains LOR I got this yesterday. It is  
 FreeBSD 6.1
 with relatively recent kernel, from last week or so.
 
 --
 VH
 
 +lock order reversal:
 + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c: 
 1547
 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/ 
 sys/ufs/ufs/ufs_vnops.c:138
 +KDB: stack backtrace:
 +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...)  
 at kdb_backtrace+0x2f
 +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at  
 witness_checkorder+0x5fe
 +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at  
 _mtx_lock_flags+0x32
 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at  
 ufs_itimes+0x6c
 +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at  
 ufs_getattr+0x20
 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...)  
 at VOP_GETATTR_APV+0x3a
 +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at  
 VOP_WRITE_APV+0x148
 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write 
 +0x201
 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at  
 dofilewrite+0x84
 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
 +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
 +Xint0x80_syscall() at Xint0x80_syscall+0x1f
 +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp =  
 0xbfbfea1c, ebp = 0xbfbfea48 ---
 
 Thank you for the report. The LOR is caused by my commit into
 sys/ufs/ufs/ufs_vnops.c, rev. 1.280.
 
 While debugging a problem I have with 6.2-RELEASE on one of my  
 servers I saw this LOR. After being up for a short while the server  
 freezes, not responding to serial console, network og keyboard. I  
 can't even get to DDB by sending BREAK on the serial console.
 
 Enabling INVARIANTS, INVARIANT_SUPPORT, WITNESS and WITNESS_SKIPSPIN  
 did not give more information about the freeze other than printing  
 the LOR now and then.
The useful way to report deadlock is described in developer handbook,
kernel debug chapter, deadlock debugging. If the freeze caused by this LOR,
then supplying me with information requested in that chapter could be
helpful (both to you and me).


pgpsIgstFJaJP.pgp
Description: PGP signature


Re: kqueue LOR

2006-12-13 Thread Kostik Belousov
On Wed, Dec 13, 2006 at 04:12:57AM +, Tor Egge wrote:
  Hmm, may be, since vnode must be interlocked by ffs_sync() after
  MNTK_SUSPENDED set, and before MNTK_SUSPEND, mount interlock is not
  needed in ufs_itimes.
  
  Tor ?

 If neither IN_CHANGE nor IN_UPDATE is set then it might be unsafe
 to set IN_MODIFIED in ufs_itimes() since the file system might be
 suspended or in the process of being suspended with the vnode sync
 loop in ffs_sync() having iterated past the vnode.
This was exactly the reason for IN_LAZYACCESS flag introduction. It is
set when fs is suspending or suspended, and no IN_CHANGE or IN_UPDATE
flags are set.

 I don't think the mount interlock is needed to check for MNTK_SUSPEND
 being set in ufs_itimes() as long as the vnode interlock is held. If
 a stale value is read without MNTK_SUSPEND set then the vnode sync
 loop in ffs_sync() cannot have iterated past the vnode, thus it should
 still be safe to set IN_MODIFIED. All writes by the CPU performing
 the vnode sync loop before it released the vnode interlock for the
 same vnode should be visible to the CPU in ufs_itimes() after it has
 obtained the vnode interlock.

I think that you statement is valid for both MNTK_SUSPEND and
MNTK_SUSPENDED flags. In other words, aquision of mount interlock could
be safely removed from the ufs_itimes(), as was suggested by [EMAIL PROTECTED]

Index: ufs/ufs/ufs_vnops.c
===
RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.283
diff -u -r1.283 ufs_vnops.c
--- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 -   1.283
+++ ufs/ufs/ufs_vnops.c 13 Dec 2006 14:10:31 -
@@ -133,19 +133,15 @@
 {
struct inode *ip;
struct timespec ts;
-   int mnt_locked;
 
ip = VTOI(vp);
-   mnt_locked = 0;
-   if ((vp-v_mount-mnt_flag  MNT_RDONLY) != 0) {
-   VI_LOCK(vp);
+   VI_LOCK(vp);
+   if ((vp-v_mount-mnt_flag  MNT_RDONLY) != 0)
goto out;
+   if ((ip-i_flag  (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) {
+   VI_UNLOCK(vp);
+   return;
}
-   MNT_ILOCK(vp-v_mount); /* For reading of mnt_kern_flags. */
-   mnt_locked = 1;
-   VI_LOCK(vp);
-   if ((ip-i_flag  (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
-   goto out_unl;
 
if ((vp-v_type == VBLK || vp-v_type == VCHR)  !DOINGSOFTDEP(vp))
ip-i_flag |= IN_LAZYMOD;
@@ -172,10 +168,7 @@
 
  out:
ip-i_flag = ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
- out_unl:
VI_UNLOCK(vp);
-   if (mnt_locked)
-   MNT_IUNLOCK(vp-v_mount);
 }
 
 /*

[ It seems that MNTK_SUSPENDED flag implies MNTK_SUSPEND. ]



pgpMGzCo3rusc.pgp
Description: PGP signature


Re: kqueue LOR

2006-12-13 Thread John Baldwin
On Tuesday 12 December 2006 21:48, Bruce Evans wrote:
  Memory barriers just specify ordering, they don't ensure a cache flush so
  another CPU reads up to date values.  You can use memory barriers in
  conjunction with atomic operations on a variable to ensure that you can
  safely read other variables (which is what locks do).  For example, in 
this
 
 I thought that the acquire/release variants of atomic ops guarantee
 this.  They seem to be documented to do this, while mutexes don't seem
 to be documented to do this.  The MI (?) implementation of mutexes
 depends on atomic_cmpset_{acq,rel}_ptr() doing this.

The acq/rel just specify ordering.  As Attilio mentioned, we assume that the 
atomic_cmpset() that sets the contested flag will fail while racing with 
another CPU (even if the CPU can't see the new value, as long as it fails and 
keeps spinning mutexes will still work).  The 'rel' barrier on CPU A when 
releasing a lock forces all the other writes to be posted (and eventually 
become visible) to other CPUs before the write that releases the lock.  
The 'acq' barrier on CPU B when acquiring the lock forces the CPU to not 
reorder any reads before it acquires the lock, so this makes you not read any 
data until you have the lock.  Thus, once CPU B has waited long enough 
to see the write from A to release the lock, we know that 1) it can 
also see all the other writes from that CPU that the lock protected, and 2) 
B hasn't tried to read any of them yet so it shouldn't have any stale values 
in registers.  None of this requires the OS to do a cache flush.  (If you 
have an SMP system where the cache can still hold stale values after another 
CPU updates values in memory where it is visible to the CPU acquiring the 
lock, then _acq might need to flush the cache, but that would be a property 
of that architecture.  However, even that would not require cache flushes so 
long as the stale values were evicted from the cache such that they honor the 
memory barrier and you don't see the new value of the lock until you see the 
new values of the earlier data.)

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2006-12-12 Thread Suleiman Souhlal

Kostik Belousov wrote:

On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:


Hi,
the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1
with relatively recent kernel, from last week or so.

--
VH




+lock order reversal:
+ 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547
+ 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ 
/usr/src/sys/ufs/ufs/ufs_vnops.c:138
+KDB: stack backtrace:
+kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at 
kdb_backtrace+0x2f
+witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at 
witness_checkorder+0x5fe
+_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32
+ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c
+ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at 
ufs_getattr+0x20
+VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at 
VOP_GETATTR_APV+0x3a
+filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
+knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
+VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at 
VOP_WRITE_APV+0x148
+vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201
+dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84
+kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
+write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
+syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
+Xint0x80_syscall() at Xint0x80_syscall+0x1f
+--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp 
= 0xbfbfea48 ---



Thank you for the report. The LOR is caused by my commit into
sys/ufs/ufs/ufs_vnops.c, rev. 1.280.


Is the mount lock really required, if all we're doing is a single read of a 
single word (mnt_kern_flags) (v_mount should be read-only for the whole 
lifetime of the vnode, I believe)? After all, reads of a single word are atomic 
on all our supported architectures.
The only situation I see where there MIGHT be problems are forced unmounts, but 
I think there are bigger issues with those.
Sorry for noticing this email only now.

-- Suleiman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2006-12-12 Thread Kostik Belousov
On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:
 Kostik Belousov wrote:
 On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:
 
 Hi,
 the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1
 with relatively recent kernel, from last week or so.
 
 --
 VH
 
 
 +lock order reversal:
 + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547
 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ 
 /usr/src/sys/ufs/ufs/ufs_vnops.c:138
 +KDB: stack backtrace:
 +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at 
 kdb_backtrace+0x2f
 +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at 
 witness_checkorder+0x5fe
 +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at 
 _mtx_lock_flags+0x32
 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at 
 ufs_itimes+0x6c
 +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at 
 ufs_getattr+0x20
 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at 
 VOP_GETATTR_APV+0x3a
 +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at 
 VOP_WRITE_APV+0x148
 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201
 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at 
 dofilewrite+0x84
 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
 +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
 +Xint0x80_syscall() at Xint0x80_syscall+0x1f
 +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 
 0xbfbfea1c, ebp = 0xbfbfea48 ---
 
 
 Thank you for the report. The LOR is caused by my commit into
 sys/ufs/ufs/ufs_vnops.c, rev. 1.280.
 
 Is the mount lock really required, if all we're doing is a single read of a 
 single word (mnt_kern_flags) (v_mount should be read-only for the whole 
 lifetime of the vnode, I believe)? After all, reads of a single word are 
 atomic on all our supported architectures.
 The only situation I see where there MIGHT be problems are forced unmounts, 
 but I think there are bigger issues with those.
 Sorry for noticing this email only now.

The problem is real with snapshotting. Ignoring
MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of
mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode
inactivation time. This was the big trouble with nfsd and snapshots. As
such, I think that precise value of mmnt_kern_flag is critical there,
and mount interlock is needed.

Practically speaking, I agree with claim that reading of m_k_f is
surrounded by enough locked operations that would make sure that
the read value is not stale. But there is no such guarantee on
future/non-i386 arches, isn't it ?

As a side note, mount interlock scope could be reduced there.

Index: ufs/ufs/ufs_vnops.c
===
RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.283
diff -u -r1.283 ufs_vnops.c
--- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 -   1.283
+++ ufs/ufs/ufs_vnops.c 12 Dec 2006 10:18:04 -
@@ -133,19 +134,19 @@
 {
struct inode *ip;
struct timespec ts;
-   int mnt_locked;
 
ip = VTOI(vp);
-   mnt_locked = 0;
if ((vp-v_mount-mnt_flag  MNT_RDONLY) != 0) {
VI_LOCK(vp);
goto out;
}
MNT_ILOCK(vp-v_mount); /* For reading of mnt_kern_flags. */
-   mnt_locked = 1;
VI_LOCK(vp);
-   if ((ip-i_flag  (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
-   goto out_unl;
+   if ((ip-i_flag  (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) {
+   MNT_IUNLOCK(vp-v_mount);
+   VI_UNLOCK(vp);
+   return;
+   }
 
if ((vp-v_type == VBLK || vp-v_type == VCHR)  !DOINGSOFTDEP(vp))
ip-i_flag |= IN_LAZYMOD;
@@ -155,6 +156,7 @@
ip-i_flag |= IN_MODIFIED;
else if (ip-i_flag  IN_ACCESS)
ip-i_flag |= IN_LAZYACCESS;
+   MNT_IUNLOCK(vp-v_mount);
vfs_timestamp(ts);
if (ip-i_flag  IN_ACCESS) {
DIP_SET(ip, i_atime, ts.tv_sec);
@@ -172,10 +174,7 @@
 
  out:
ip-i_flag = ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
- out_unl:
VI_UNLOCK(vp);
-   if (mnt_locked)
-   MNT_IUNLOCK(vp-v_mount);
 }
 
 /*


pgpkA3SWblcGR.pgp
Description: PGP signature


Re: kqueue LOR

2006-12-12 Thread Bruce Evans

On Tue, 12 Dec 2006, Kostik Belousov wrote:


On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:



Is the mount lock really required, if all we're doing is a single read of a
single word (mnt_kern_flags) (v_mount should be read-only for the whole
lifetime of the vnode, I believe)? After all, reads of a single word are
atomic on all our supported architectures.
The only situation I see where there MIGHT be problems are forced unmounts,
but I think there are bigger issues with those.
Sorry for noticing this email only now.


The problem is real with snapshotting. Ignoring
MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of
mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode
inactivation time. This was the big trouble with nfsd and snapshots. As
such, I think that precise value of mmnt_kern_flag is critical there,
and mount interlock is needed.


Locking for just read is almost always bogus, but here (as in most
cases) there is also a write based on the contents of the flag, and
the lock is held across the write.


Practically speaking, I agree with claim that reading of m_k_f is
surrounded by enough locked operations that would make sure that
the read value is not stale. But there is no such guarantee on
future/non-i386 arches, isn't it ?


I think not-very-staleness is implied by acquire/release semantics
which are part of the API for most atomic operations.  This behaviour
doesn't seem to be documented for mutexes, but I don't see how mutexes
could work without it (they have to synchronize all memory accesses,
not just the memory accessed by the lock).


As a side note, mount interlock scope could be reduced there.

Index: ufs/ufs/ufs_vnops.c
===
RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.283
diff -u -r1.283 ufs_vnops.c
--- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 -   1.283
+++ ufs/ufs/ufs_vnops.c 12 Dec 2006 10:18:04 -
@@ -133,19 +134,19 @@
{
struct inode *ip;
struct timespec ts;
-   int mnt_locked;

ip = VTOI(vp);
-   mnt_locked = 0;
if ((vp-v_mount-mnt_flag  MNT_RDONLY) != 0) {
VI_LOCK(vp);
goto out;
}
MNT_ILOCK(vp-v_mount);  /* For reading of mnt_kern_flags. 
*/
-   mnt_locked = 1;
VI_LOCK(vp);
-   if ((ip-i_flag  (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
-   goto out_unl;
+   if ((ip-i_flag  (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) {
+   MNT_IUNLOCK(vp-v_mount);
+   VI_UNLOCK(vp);
+   return;
+   }


The version that depends on not-very-staleness would test the flags
without acquiring the lock(s) and return immediately in the usual case
where none of the flags are set.  It would have to acquire the locks
and repeat the test to make changes (and the test is already repeated
one flag at a time).  I think this would be correct enough, but still
inefficient and/or even messier.  The current organization is usually:

acquire vnode interlock in caller
release vnode interlock in caller to avoid messes here (inefficient)
call
acquire mount interlock
acquire vnode interlock
test the flags; goto cleanup code if none set (usual case)
do the work
release vnode interlock
release mount interlock
return
acquire vnode interlock (if needed)
release vnode interlock (if needed)

and it might become:

acquire vnode interlock in caller
call
test the flags; return if none set (usual case)
release vnode interlock // check that callers are aware of this
acquire mount interlock
acquire vnode interlock
do the work
// Assume no LOR problem for release, as below.
// Otherwise need another relese+acquire of vnode interlock.
release mount interlock
return
release vnode interlock



if ((vp-v_type == VBLK || vp-v_type == VCHR)  !DOINGSOFTDEP(vp))
ip-i_flag |= IN_LAZYMOD;
@@ -155,6 +156,7 @@
ip-i_flag |= IN_MODIFIED;
else if (ip-i_flag  IN_ACCESS)
ip-i_flag |= IN_LAZYACCESS;
+   MNT_IUNLOCK(vp-v_mount);
vfs_timestamp(ts);
if (ip-i_flag  IN_ACCESS) {
DIP_SET(ip, i_atime, ts.tv_sec);


Is there no LOR problem for release?

As I understand it, MNT_ILOCK() is only protecting IN_ACCESS being
converted to IN_MODIFED, so after this conversion is done the lock
is not needed.  Is this correct?


@@ -172,10 +174,7 @@

 out:
ip-i_flag = ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
- out_unl:
VI_UNLOCK(vp);
-   if (mnt_locked)
-   MNT_IUNLOCK(vp-v_mount);
}

/*



BTW, vfs.lookup_shared defaults to 0 and decides shared access for all
operations including read, so I wonder if there are [m]any bugs
preventing shared accesses 

Re: kqueue LOR

2006-12-12 Thread Kostik Belousov
On Tue, Dec 12, 2006 at 11:49:42PM +1100, Bruce Evans wrote:
 On Tue, 12 Dec 2006, Kostik Belousov wrote:
 
 On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:
 
 Is the mount lock really required, if all we're doing is a single read of 
 a
 single word (mnt_kern_flags) (v_mount should be read-only for the whole
 lifetime of the vnode, I believe)? After all, reads of a single word are
 atomic on all our supported architectures.
 The only situation I see where there MIGHT be problems are forced 
 unmounts,
 but I think there are bigger issues with those.
 Sorry for noticing this email only now.
 
 The problem is real with snapshotting. Ignoring
 MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of
 mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode
 inactivation time. This was the big trouble with nfsd and snapshots. As
 such, I think that precise value of mmnt_kern_flag is critical there,
 and mount interlock is needed.
 
 Locking for just read is almost always bogus, but here (as in most
 cases) there is also a write based on the contents of the flag, and
 the lock is held across the write.
 
 Practically speaking, I agree with claim that reading of m_k_f is
 surrounded by enough locked operations that would make sure that
 the read value is not stale. But there is no such guarantee on
 future/non-i386 arches, isn't it ?
 
 I think not-very-staleness is implied by acquire/release semantics
 which are part of the API for most atomic operations.  This behaviour
 doesn't seem to be documented for mutexes, but I don't see how mutexes
 could work without it (they have to synchronize all memory accesses,
 not just the memory accessed by the lock).
 
 As a side note, mount interlock scope could be reduced there.
 
 Index: ufs/ufs/ufs_vnops.c
 ===
 RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
 retrieving revision 1.283
 diff -u -r1.283 ufs_vnops.c
 --- ufs/ufs/ufs_vnops.c  6 Nov 2006 13:42:09 -   1.283
 +++ ufs/ufs/ufs_vnops.c  12 Dec 2006 10:18:04 -
 @@ -133,19 +134,19 @@
 {
  struct inode *ip;
  struct timespec ts;
 -int mnt_locked;
 
  ip = VTOI(vp);
 -mnt_locked = 0;
  if ((vp-v_mount-mnt_flag  MNT_RDONLY) != 0) {
  VI_LOCK(vp);
  goto out;
  }
  MNT_ILOCK(vp-v_mount); /* For reading of mnt_kern_flags. */
 -mnt_locked = 1;
  VI_LOCK(vp);
 -if ((ip-i_flag  (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
 -goto out_unl;
 +if ((ip-i_flag  (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) {
 +MNT_IUNLOCK(vp-v_mount);
 +VI_UNLOCK(vp);
 +return;
 +}
 
 The version that depends on not-very-staleness would test the flags
 without acquiring the lock(s) and return immediately in the usual case
 where none of the flags are set.  It would have to acquire the locks
 and repeat the test to make changes (and the test is already repeated
 one flag at a time).  I think this would be correct enough, but still
 inefficient and/or even messier.  The current organization is usually:
 
 acquire vnode interlock in caller
 release vnode interlock in caller to avoid messes here (inefficient)
 call
 acquire mount interlock
 acquire vnode interlock
 test the flags; goto cleanup code if none set (usual case)
 do the work
 release vnode interlock
 release mount interlock
 return
 acquire vnode interlock (if needed)
 release vnode interlock (if needed)
 
 and it might become:
 
 acquire vnode interlock in caller
 call
 test the flags; return if none set (usual case)
 release vnode interlock   // check that callers are aware of 
 this
 acquire mount interlock
 acquire vnode interlock
 do the work
   // Assume no LOR problem for release, as below.
   // Otherwise need another relese+acquire of vnode interlock.
 release mount interlock
 return
 release vnode interlock
 
 
  if ((vp-v_type == VBLK || vp-v_type == VCHR)  !DOINGSOFTDEP(vp))
  ip-i_flag |= IN_LAZYMOD;
 @@ -155,6 +156,7 @@
  ip-i_flag |= IN_MODIFIED;
  else if (ip-i_flag  IN_ACCESS)
  ip-i_flag |= IN_LAZYACCESS;
 +MNT_IUNLOCK(vp-v_mount);
  vfs_timestamp(ts);
  if (ip-i_flag  IN_ACCESS) {
  DIP_SET(ip, i_atime, ts.tv_sec);
 
 Is there no LOR problem for release?
AFAIK, no. Release is guaranteed to success without blocking.

 
 As I understand it, MNT_ILOCK() is only protecting IN_ACCESS being
 converted to IN_MODIFED, so after this conversion is done the lock
 is not needed.  Is this correct?
The code shall set IN_MODIFIED flag only if MNTK_SUSPEND and
MNTK_SUSPENDED are clean.

The vfs_write_suspend() that set MNTK_SUSPEND, does VFS_SYNC() after setting
the flag. Since VFS_SYNC() 

Re: kqueue LOR

2006-12-12 Thread Attilio Rao

2006/12/12, Kostik Belousov [EMAIL PROTECTED]:

On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:
 Kostik Belousov wrote:
 On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:
 
 Hi,
 the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1
 with relatively recent kernel, from last week or so.
 
 --
 VH
 
 
 +lock order reversal:
 + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547
 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @
 /usr/src/sys/ufs/ufs/ufs_vnops.c:138
 +KDB: stack backtrace:
 +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at
 kdb_backtrace+0x2f
 +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at
 witness_checkorder+0x5fe
 +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at
 _mtx_lock_flags+0x32
 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at
 ufs_itimes+0x6c
 +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at
 ufs_getattr+0x20
 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at
 VOP_GETATTR_APV+0x3a
 +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at
 VOP_WRITE_APV+0x148
 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201
 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at
 dofilewrite+0x84
 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
 +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
 +Xint0x80_syscall() at Xint0x80_syscall+0x1f
 +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp =
 0xbfbfea1c, ebp = 0xbfbfea48 ---
 
 
 Thank you for the report. The LOR is caused by my commit into
 sys/ufs/ufs/ufs_vnops.c, rev. 1.280.

 Is the mount lock really required, if all we're doing is a single read of a
 single word (mnt_kern_flags) (v_mount should be read-only for the whole
 lifetime of the vnode, I believe)? After all, reads of a single word are
 atomic on all our supported architectures.
 The only situation I see where there MIGHT be problems are forced unmounts,
 but I think there are bigger issues with those.
 Sorry for noticing this email only now.

The problem is real with snapshotting. Ignoring
MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of
mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode
inactivation time. This was the big trouble with nfsd and snapshots. As
such, I think that precise value of mmnt_kern_flag is critical there,
and mount interlock is needed.


This can be avoided using a memory barrier when setting flags.
Even if memory barriers usage is not encouraged, some critical code
should really use them replacing a mutex semantic (if that worths it).

Attilio

--
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2006-12-12 Thread Suleiman Souhlal

Attilio Rao wrote:

2006/12/12, Kostik Belousov [EMAIL PROTECTED]:


On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:
 Kostik Belousov wrote:
 On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:
 
 Hi,
 the attached lor.txt contains LOR I got this yesterday. It is 
FreeBSD 6.1

 with relatively recent kernel, from last week or so.
 
 --
 VH
 
 
 +lock order reversal:
 + 1st 0xc537f300 kqueue (kqueue) @ 
/usr/src/sys/kern/kern_event.c:1547

 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @
 /usr/src/sys/ufs/ufs/ufs_vnops.c:138
 +KDB: stack backtrace:
 +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at
 kdb_backtrace+0x2f
 +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at
 witness_checkorder+0x5fe
 +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at
 _mtx_lock_flags+0x32
 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at
 ufs_itimes+0x6c
 +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at
 ufs_getattr+0x20
 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at
 VOP_GETATTR_APV+0x3a
 +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at
 VOP_WRITE_APV+0x148
 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at 
vn_write+0x201

 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at
 dofilewrite+0x84
 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
 +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
 +Xint0x80_syscall() at Xint0x80_syscall+0x1f
 +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp =
 0xbfbfea1c, ebp = 0xbfbfea48 ---
 
 
 Thank you for the report. The LOR is caused by my commit into
 sys/ufs/ufs/ufs_vnops.c, rev. 1.280.

 Is the mount lock really required, if all we're doing is a single 
read of a

 single word (mnt_kern_flags) (v_mount should be read-only for the whole
 lifetime of the vnode, I believe)? After all, reads of a single word 
are

 atomic on all our supported architectures.
 The only situation I see where there MIGHT be problems are forced 
unmounts,

 but I think there are bigger issues with those.
 Sorry for noticing this email only now.

The problem is real with snapshotting. Ignoring
MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of
mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode
inactivation time. This was the big trouble with nfsd and snapshots. As
such, I think that precise value of mmnt_kern_flag is critical there,
and mount interlock is needed.



This can be avoided using a memory barrier when setting flags.
Even if memory barriers usage is not encouraged, some critical code
should really use them replacing a mutex semantic (if that worths it).


Why is memory barrier usage not encouraged? As you said, they can be used to 
reduce the number of atomic (LOCKed) operations, in some cases.

FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier), mb() (load/store mem 
barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers only needed on SMP), and barrier() 
(GCC barrier (__asm __volatile (:::memory)) macros that I've personally found 
very useful.
Admittedly, they are harder to use than atomic operations, but it might still 
worth having something similar.

-- Suleiman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2006-12-12 Thread Attilio Rao

2006/12/12, Suleiman Souhlal [EMAIL PROTECTED]:

Attilio Rao wrote:
 2006/12/12, Kostik Belousov [EMAIL PROTECTED]:

 On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:
  Kostik Belousov wrote:
  On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:
  
  Hi,
  the attached lor.txt contains LOR I got this yesterday. It is
 FreeBSD 6.1
  with relatively recent kernel, from last week or so.
  
  --
  VH
  
  
  +lock order reversal:
  + 1st 0xc537f300 kqueue (kqueue) @
 /usr/src/sys/kern/kern_event.c:1547
  + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @
  /usr/src/sys/ufs/ufs/ufs_vnops.c:138
  +KDB: stack backtrace:
  +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at
  kdb_backtrace+0x2f
  +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at
  witness_checkorder+0x5fe
  +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at
  _mtx_lock_flags+0x32
  +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at
  ufs_itimes+0x6c
  +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at
  ufs_getattr+0x20
  +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at
  VOP_GETATTR_APV+0x3a
  +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
  +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
  +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at
  VOP_WRITE_APV+0x148
  +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at
 vn_write+0x201
  +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at
  dofilewrite+0x84
  +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
  +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
  +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
  +Xint0x80_syscall() at Xint0x80_syscall+0x1f
  +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp =
  0xbfbfea1c, ebp = 0xbfbfea48 ---
  
  
  Thank you for the report. The LOR is caused by my commit into
  sys/ufs/ufs/ufs_vnops.c, rev. 1.280.
 
  Is the mount lock really required, if all we're doing is a single
 read of a
  single word (mnt_kern_flags) (v_mount should be read-only for the whole
  lifetime of the vnode, I believe)? After all, reads of a single word
 are
  atomic on all our supported architectures.
  The only situation I see where there MIGHT be problems are forced
 unmounts,
  but I think there are bigger issues with those.
  Sorry for noticing this email only now.

 The problem is real with snapshotting. Ignoring
 MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of
 mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode
 inactivation time. This was the big trouble with nfsd and snapshots. As
 such, I think that precise value of mmnt_kern_flag is critical there,
 and mount interlock is needed.


 This can be avoided using a memory barrier when setting flags.
 Even if memory barriers usage is not encouraged, some critical code
 should really use them replacing a mutex semantic (if that worths it).

Why is memory barrier usage not encouraged? As you said, they can be used to 
reduce the number of atomic (LOCKed) operations, in some cases.


Beacause they can lead to errors as it is not so straightforward to
understand when a memory barrier is needed more than an atomic
instruction and so on
(even if it doesn't value, for example, for ia32, for other
architectures memory barriers could be more expensive than the atomic
instruction, without counting a possible error).


FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier), mb() (load/store mem 
barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers only needed on SMP), and barrier() 
(GCC barrier (__asm __volatile (:::memory)) macros that I've personally found 
very useful.


I think that our memory barriers reflect the usage we do into the
kernel as the base for building syncronizing primitives. From this
point of view our atomic operations (meant into the wider possible
sense, man 9 atomic) are more suitable than having something like
Linux's smp_*().

Attilio


--
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2006-12-12 Thread John Baldwin
On Tuesday 12 December 2006 13:43, Suleiman Souhlal wrote:
 Attilio Rao wrote:
  2006/12/12, Kostik Belousov [EMAIL PROTECTED]:
  
  On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:
   Kostik Belousov wrote:
   On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:
   
   Hi,
   the attached lor.txt contains LOR I got this yesterday. It is 
  FreeBSD 6.1
   with relatively recent kernel, from last week or so.
   
   --
   VH
   
   
   +lock order reversal:
   + 1st 0xc537f300 kqueue (kqueue) @ 
  /usr/src/sys/kern/kern_event.c:1547
   + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @
   /usr/src/sys/ufs/ufs/ufs_vnops.c:138
   +KDB: stack backtrace:
   +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at
   kdb_backtrace+0x2f
   +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at
   witness_checkorder+0x5fe
   +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at
   _mtx_lock_flags+0x32
   +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at
   ufs_itimes+0x6c
   +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at
   ufs_getattr+0x20
   +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at
   VOP_GETATTR_APV+0x3a
   +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
   +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
   +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at
   VOP_WRITE_APV+0x148
   +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at 
  vn_write+0x201
   +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at
   dofilewrite+0x84
   +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
   +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
   +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
   +Xint0x80_syscall() at Xint0x80_syscall+0x1f
   +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp =
   0xbfbfea1c, ebp = 0xbfbfea48 ---
   
   
   Thank you for the report. The LOR is caused by my commit into
   sys/ufs/ufs/ufs_vnops.c, rev. 1.280.
  
   Is the mount lock really required, if all we're doing is a single 
  read of a
   single word (mnt_kern_flags) (v_mount should be read-only for the whole
   lifetime of the vnode, I believe)? After all, reads of a single word 
  are
   atomic on all our supported architectures.
   The only situation I see where there MIGHT be problems are forced 
  unmounts,
   but I think there are bigger issues with those.
   Sorry for noticing this email only now.
 
  The problem is real with snapshotting. Ignoring
  MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of
  mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode
  inactivation time. This was the big trouble with nfsd and snapshots. As
  such, I think that precise value of mmnt_kern_flag is critical there,
  and mount interlock is needed.
  
  
  This can be avoided using a memory barrier when setting flags.
  Even if memory barriers usage is not encouraged, some critical code
  should really use them replacing a mutex semantic (if that worths it).
 
 Why is memory barrier usage not encouraged? As you said, they can be used to 
reduce the number of atomic (LOCKed) operations, in some cases.
 
 FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier), mb() 
(load/store mem barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers only 
needed on SMP), and barrier() (GCC barrier (__asm __volatile (:::memory)) 
macros that I've personally found very useful.
 Admittedly, they are harder to use than atomic operations, but it might 
still worth having something similar.

Memory barriers just specify ordering, they don't ensure a cache flush so 
another CPU reads up to date values.  You can use memory barriers in 
conjunction with atomic operations on a variable to ensure that you can 
safely read other variables (which is what locks do).  For example, in this 
case IIUC, you have a race that is because there is shared state between two 
fields, one in the mount structure, and one in the ufs i-node.  Memory 
barriers alone won't prevent you from operating on those flags 
non-consistently.  That is, you have two memory locations in play here, and 
atomic ops only work on a single one.  There isn't an atomic op to do read 
from memory location A, check flag B, and if it's true write C to memory 
location D.  Where would you put the membar in this case to ensure that the 
action always results in consistent behavior?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2006-12-12 Thread Bruce Evans

On Tue, 12 Dec 2006, John Baldwin wrote:


On Tuesday 12 December 2006 13:43, Suleiman Souhlal wrote:



Why is memory barrier usage not encouraged? As you said, they can be used to
reduce the number of atomic (LOCKed) operations, in some cases.
...
Admittedly, they are harder to use than atomic operations, but it might
still worth having something similar.


How would MI code know when using memory barriers is good?  This is already
hard to know for atomic ops -- if there would more than a couple of atomic
ops then it is probably better to use 1 mutex lock/unlock and no atomic
ops, since this reduces the total number of atomic ops in most cases, but
it is hard for MI code to know how many a couple is.  (This also depends
on the SMP option -- without SMP, locking is automatic so atomic ops are
very fast but mutexes are still slow since they do a lot more than an
atomic op.)


Memory barriers just specify ordering, they don't ensure a cache flush so
another CPU reads up to date values.  You can use memory barriers in
conjunction with atomic operations on a variable to ensure that you can
safely read other variables (which is what locks do).  For example, in this


I thought that the acquire/release variants of atomic ops guarantee
this.  They seem to be documented to do this, while mutexes don't seem
to be documented to do this.  The MI (?) implementation of mutexes
depends on atomic_cmpset_{acq,rel}_ptr() doing this.

Bruc
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2006-12-12 Thread Tor Egge
 Hmm, may be, since vnode must be interlocked by ffs_sync() after
 MNTK_SUSPENDED set, and before MNTK_SUSPEND, mount interlock is not
 needed in ufs_itimes.
 
 Tor ?

If neither IN_CHANGE nor IN_UPDATE is set then it might be unsafe to set
IN_MODIFIED in ufs_itimes() since the file system might be suspended or in the
process of being suspended with the vnode sync loop in ffs_sync() having
iterated past the vnode.

I don't think the mount interlock is needed to check for MNTK_SUSPEND being set
in ufs_itimes() as long as the vnode interlock is held.  If a stale value is
read without MNTK_SUSPEND set then the vnode sync loop in ffs_sync() cannot
have iterated past the vnode, thus it should still be safe to set IN_MODIFIED.
All writes by the CPU performing the vnode sync loop before it released the
vnode interlock for the same vnode should be visible to the CPU in ufs_itimes()
after it has obtained the vnode interlock.

- Tor Egge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kqueue LOR

2006-11-27 Thread Kostik Belousov
On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:
 Hi,
 the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1
 with relatively recent kernel, from last week or so.
 
 --
 VH

 +lock order reversal:
 + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547
 + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ 
 /usr/src/sys/ufs/ufs/ufs_vnops.c:138
 +KDB: stack backtrace:
 +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at 
 kdb_backtrace+0x2f
 +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at 
 witness_checkorder+0x5fe
 +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32
 +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at 
 ufs_itimes+0x6c
 +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at 
 ufs_getattr+0x20
 +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at 
 VOP_GETATTR_APV+0x3a
 +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
 +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
 +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at 
 VOP_WRITE_APV+0x148
 +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201
 +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84
 +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
 +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
 +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
 +Xint0x80_syscall() at Xint0x80_syscall+0x1f
 +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, 
 ebp = 0xbfbfea48 ---

Thank you for the report. The LOR is caused by my commit into
sys/ufs/ufs/ufs_vnops.c, rev. 1.280.

What application you run that triggers the LOR ? Patch below is one
possible approach to fixing it.

Index: kern/vnode_if.src
===
RCS file: /usr/local/arch/ncvs/src/sys/kern/vnode_if.src,v
retrieving revision 1.84
diff -u -r1.84 vnode_if.src
--- kern/vnode_if.src   13 Nov 2006 05:51:22 -  1.84
+++ kern/vnode_if.src   26 Nov 2006 15:20:44 -
@@ -164,6 +164,16 @@
 };
 
 
+%% getattrfast vp  L L L
+
+vop_getattrfast {
+   IN struct vnode *vp;
+   OUT struct vattr *vap;
+   IN struct ucred *cred;
+   IN struct thread *td;
+};
+
+
 %% setattr vp  E E E
 %! setattr postvop_setattr_post
 
Index: kern/vfs_default.c
===
RCS file: /usr/local/arch/ncvs/src/sys/kern/vfs_default.c,v
retrieving revision 1.135
diff -u -r1.135 vfs_default.c
--- kern/vfs_default.c  13 Nov 2006 05:51:22 -  1.135
+++ kern/vfs_default.c  26 Nov 2006 15:20:44 -
@@ -62,6 +62,7 @@
 
 static int vop_nolookup(struct vop_lookup_args *);
 static int vop_nostrategy(struct vop_strategy_args *);
+static int vop_stdgetattrfast(struct vop_getattrfast_args *);
 
 /*
  * This vnode table stores what we want to do if the filesystem doesn't
@@ -96,6 +97,7 @@
.vop_revoke =   VOP_PANIC,
.vop_strategy = vop_nostrategy,
.vop_unlock =   vop_stdunlock,
+   .vop_getattrfast =  vop_stdgetattrfast,
 };
 
 /*
@@ -511,6 +513,19 @@
 ap-a_sync, ap-a_rtvals);
 }
 
+static int
+vop_stdgetattrfast(ap)
+   struct vop_getattrfast_args /* {
+   struct vnode *vp;
+   struct vattr *vap;
+   struct ucred *cred;
+   struct thread *td;
+   } */ *ap;
+{
+
+   return VOP_GETATTR(ap-a_vp, ap-a_vap, ap-a_cred, ap-a_td);
+}
+
 /*
  * vfs default ops
  * used to fill the vfs function table to get reasonable default return values.
Index: kern/vfs_subr.c
===
RCS file: /usr/local/arch/ncvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.692
diff -u -r1.692 vfs_subr.c
--- kern/vfs_subr.c 13 Nov 2006 05:51:22 -  1.692
+++ kern/vfs_subr.c 26 Nov 2006 15:20:44 -
@@ -3828,7 +3828,7 @@
return (1);
}
 
-   if (VOP_GETATTR(vp, va, curthread-td_ucred, curthread)) 
+   if (VOP_GETATTRFAST(vp, va, curthread-td_ucred, curthread)) 
return (0);
 
kn-kn_data = va.va_size - kn-kn_fp-f_offset;
Index: ufs/ufs/ufs_vnops.c
===
RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.283
diff -u -r1.283 ufs_vnops.c
--- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 -   1.283
+++ ufs/ufs/ufs_vnops.c 26 Nov 2006 15:20:44 -
@@ -97,6 +97,7 @@
 static vop_close_t ufs_close;
 static vop_create_tufs_create;
 static vop_getattr_t   ufs_getattr;
+static vop_getattrfast_t   ufs_getattrfast;
 static vop_link_t  ufs_link;
 static int ufs_makeinode(int mode, struct vnode *, struct vnode **, struct 
componentname *);
 static vop_mkdir_t ufs_mkdir;

Re: kqueue LOR

2006-11-27 Thread Václav Haisman
Kostik Belousov wrote:
 On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote:
[...]

 
 Thank you for the report. The LOR is caused by my commit into
 sys/ufs/ufs/ufs_vnops.c, rev. 1.280.
 
 What application you run that triggers the LOR ? Patch below is one
I have no idea, I just found it in daily summary logs.

 possible approach to fixing it.
 
[...]

--
VH



signature.asc
Description: OpenPGP digital signature


kqueue LOR

2006-11-26 Thread Václav Haisman
Hi,
the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1
with relatively recent kernel, from last week or so.

--
VH
+lock order reversal:
+ 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547
+ 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ 
/usr/src/sys/ufs/ufs/ufs_vnops.c:138
+KDB: stack backtrace:
+kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at 
kdb_backtrace+0x2f
+witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at 
witness_checkorder+0x5fe
+_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at _mtx_lock_flags+0x32
+ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at ufs_itimes+0x6c
+ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at 
ufs_getattr+0x20
+VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at 
VOP_GETATTR_APV+0x3a
+filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75
+knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
+VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at 
VOP_WRITE_APV+0x148
+vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write+0x201
+dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,,...) at dofilewrite+0x84
+kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65
+write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
+syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
+Xint0x80_syscall() at Xint0x80_syscall+0x1f
+--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = 0xbfbfea1c, ebp 
= 0xbfbfea48 ---


signature.asc
Description: OpenPGP digital signature


Re: kqueue LOR

2006-11-26 Thread Bjoern A. Zeeb

On Sun, 26 Nov 2006, Václav Haisman wrote:


Hi,
the attached lor.txt contains LOR I got this yesterday. It is FreeBSD 6.1
with relatively recent kernel, from last week or so.



lock order reversal:
 1st 0xc537f300 kqueue (kqueue) @ sys/kern/kern_event.c:1547
 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ 
sys/ufs/ufs/ufs_vnops.c:138


Added to The LOR page with LOR ID # 193:
http://sources.zabbadoz.net/freebsd/lor.html#193

--
Bjoern A. Zeeb  bzeeb at Zabbadoz dot NeT___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]