Re: [GIT PULL] overlay filesystem v25

2014-10-28 Thread Paul E. McKenney
On Tue, Oct 28, 2014 at 10:55:13PM +, Al Viro wrote:
> On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 28, 2014 at 01:12:14AM +, Al Viro wrote:
> > > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > > in my experience, but good point on the sparse noise.  Maybe a new
> > > > lockless_dereference() primitive?  Maybe something like the following?
> > > > (Untested, probably does not even build.)
> > > > 
> > > > #define lockless_dereference(p) \
> > > > ({ \
> > > > typeof(*p) *_p1 = ACCESS_ONCE(p); \
> > > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ 
> > > > \
> > > > _p1; \
> > > > })
> > > 
> > > Hmm...  Where would you prefer to put it?  rcupdate.h?
> > 
> > Good a place as any, I guess.  Please see patch below.  Left to myself,
> > I would send this along for the next merge window, but please let me
> > know if you would like it sooner.
> 
> It's needed sooner, unfortunately.  Guys, could you take a look at
> vfs.git#for-linus and comment?

The version in your tree looks good.  I will drop my commit in favor
of yours.

The use of lockless_dereference() and smp_mb__before_spinlock() in
d45f00ae43 (overlayfs: barriers for opening upper-layer directory)
looks fine to me.  Only nit is lack of space between "=" and
lockless_dereference().

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-28 Thread Al Viro
On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 28, 2014 at 01:12:14AM +, Al Viro wrote:
> > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > in my experience, but good point on the sparse noise.  Maybe a new
> > > lockless_dereference() primitive?  Maybe something like the following?
> > > (Untested, probably does not even build.)
> > > 
> > > #define lockless_dereference(p) \
> > > ({ \
> > >   typeof(*p) *_p1 = ACCESS_ONCE(p); \
> > >   smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > >   _p1; \
> > > })
> > 
> > Hmm...  Where would you prefer to put it?  rcupdate.h?
> 
> Good a place as any, I guess.  Please see patch below.  Left to myself,
> I would send this along for the next merge window, but please let me
> know if you would like it sooner.

It's needed sooner, unfortunately.  Guys, could you take a look at
vfs.git#for-linus and comment?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-28 Thread Al Viro
On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
 On Tue, Oct 28, 2014 at 01:12:14AM +, Al Viro wrote:
  On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
   Code making direct use of smp_read_barrier_depends() is harder to read,
   in my experience, but good point on the sparse noise.  Maybe a new
   lockless_dereference() primitive?  Maybe something like the following?
   (Untested, probably does not even build.)
   
   #define lockless_dereference(p) \
   ({ \
 typeof(*p) *_p1 = ACCESS_ONCE(p); \
 smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
 _p1; \
   })
  
  Hmm...  Where would you prefer to put it?  rcupdate.h?
 
 Good a place as any, I guess.  Please see patch below.  Left to myself,
 I would send this along for the next merge window, but please let me
 know if you would like it sooner.

It's needed sooner, unfortunately.  Guys, could you take a look at
vfs.git#for-linus and comment?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-28 Thread Paul E. McKenney
On Tue, Oct 28, 2014 at 10:55:13PM +, Al Viro wrote:
 On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
  On Tue, Oct 28, 2014 at 01:12:14AM +, Al Viro wrote:
   On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
Code making direct use of smp_read_barrier_depends() is harder to read,
in my experience, but good point on the sparse noise.  Maybe a new
lockless_dereference() primitive?  Maybe something like the following?
(Untested, probably does not even build.)

#define lockless_dereference(p) \
({ \
typeof(*p) *_p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ 
\
_p1; \
})
   
   Hmm...  Where would you prefer to put it?  rcupdate.h?
  
  Good a place as any, I guess.  Please see patch below.  Left to myself,
  I would send this along for the next merge window, but please let me
  know if you would like it sooner.
 
 It's needed sooner, unfortunately.  Guys, could you take a look at
 vfs.git#for-linus and comment?

The version in your tree looks good.  I will drop my commit in favor
of yours.

The use of lockless_dereference() and smp_mb__before_spinlock() in
d45f00ae43 (overlayfs: barriers for opening upper-layer directory)
looks fine to me.  Only nit is lack of space between = and
lockless_dereference().

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Paul E. McKenney
On Tue, Oct 28, 2014 at 01:12:14AM +, Al Viro wrote:
> On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > Code making direct use of smp_read_barrier_depends() is harder to read,
> > in my experience, but good point on the sparse noise.  Maybe a new
> > lockless_dereference() primitive?  Maybe something like the following?
> > (Untested, probably does not even build.)
> > 
> > #define lockless_dereference(p) \
> > ({ \
> > typeof(*p) *_p1 = ACCESS_ONCE(p); \
> > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > _p1; \
> > })
> 
> Hmm...  Where would you prefer to put it?  rcupdate.h?

Good a place as any, I guess.  Please see patch below.  Left to myself,
I would send this along for the next merge window, but please let me
know if you would like it sooner.

Thanx, Paul



rcu: Provide counterpart to rcu_dereference() for non-RCU situations

Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying.  This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.

Reported-by: Al Viro 
Signed-off-by: Paul E. McKenney 

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9575c2d403b5..ed4f5939a452 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -617,6 +617,21 @@ static inline void rcu_preempt_sleep_check(void)
 #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
 
 /**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU.  That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+   typeof(p) _p1 = ACCESS_ONCE(p); \
+   smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+   (_p1); \
+})
+
+/**
  * rcu_assign_pointer() - assign to RCU-protected pointer
  * @p: pointer to assign to
  * @v: value to assign (publish)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Al Viro
On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> Code making direct use of smp_read_barrier_depends() is harder to read,
> in my experience, but good point on the sparse noise.  Maybe a new
> lockless_dereference() primitive?  Maybe something like the following?
> (Untested, probably does not even build.)
> 
> #define lockless_dereference(p) \
> ({ \
>   typeof(*p) *_p1 = ACCESS_ONCE(p); \
>   smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
>   _p1; \
> })

Hmm...  Where would you prefer to put it?  rcupdate.h?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Paul E. McKenney
On Mon, Oct 27, 2014 at 05:28:01PM +, Al Viro wrote:
> On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:
> 
> > Indeed, life is hard here.  Keep in mind that lock acquisition is not
> > guaranteed to prevent prior operations from being reordered into the
> > critical section, possibly as follows:
> > 
> > CPU1:
> >   grab lock
> >   if (!global)
> >   global = p;
> >   /* Assume all of CPU2's accesses happen here. */
> >   p->foo = 1;
> 
> A bit of context: p is a local pointer to struct file here and alloc is
> opening it...

OK, good to know.  ;-)

> > This clearly allows CPU2 to execute as follows:
> > 
> > CPU2:
> >   p = global; /* gets p */
> >   if (p) /* non-NULL */
> > q = p->foo; /* might not be 1 */
> > 
> > Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> > accesses can be misordered.  You need rcu_dereference() or the combination
> > of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> > As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> > more info.
> > 
> > So no, there is no guarantee.  I am assuming that the lock grabbed by
> > CPU1 guards all assignments to "global", otherwise the code needs further
> > help.  I am further assuming that the memory pointed to by CPU1's "p"
> > is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> > Otherwise, the assignment "p->foo = 1" is questionable.  And finally,
> > I am assuming that p->foo stays constant once it has been made
> > accessible to readers.
> > 
> > But the following should work:
> > 
> > CPU1:
> >   p->foo = 1;  /* Assumes p is local. */
> >   smp_mb__before_spinlock();
> >   grab lock
> >   if (!global)  /* Assumes lock protects all assignments to global. */
> >   global = p;
> > 
> > CPU2:
> >   p = rcu_dereference(global);
> >   if (p)
> >  q = p->foo; /* Assumes p->foo constant once visible to readers. */
> >  /* Also assumes p and q are local. */
> > 
> > If the assumptions called out in the comments do not hold, you at least
> > need ACCESS_ONCE(), and perhaps even more synchronization.  For more info
> > on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
> 
> First of all, this "p->foo = 1" is a shorthand for initialization of
> struct file done by opening a file.  What you are saying is that it
> can leak past the point where we stick a pointer to freshly opened
> file into a place where another CPU can see it, but not past the
> barrier in dropping the lock, right?

Exactly!

I should also add that smp_mb__before_spinlock() implies smp_wmb(), but
nothing more.  But that is OK because smp_wmb() suffices in this case.

> And you are suggesting rcu_dereference() as a way to bring the required
> barriers in.  Ouch...  The names are really bad, but there's another
> fun issue - rcu_dereference brings in sparse noise.  Wouldn't direct use
> of smp_read_barrier_depends() be cleaner, anyway?

Code making direct use of smp_read_barrier_depends() is harder to read,
in my experience, but good point on the sparse noise.  Maybe a new
lockless_dereference() primitive?  Maybe something like the following?
(Untested, probably does not even build.)

#define lockless_dereference(p) \
({ \
typeof(*p) *_p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
_p1; \
})

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Al Viro
On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:

> Indeed, life is hard here.  Keep in mind that lock acquisition is not
> guaranteed to prevent prior operations from being reordered into the
> critical section, possibly as follows:
> 
>   CPU1:
> grab lock
> if (!global)
> global = p;
> /* Assume all of CPU2's accesses happen here. */
> p->foo = 1;

A bit of context: p is a local pointer to struct file here and alloc is
opening it...

> This clearly allows CPU2 to execute as follows:
> 
>   CPU2:
> p = global; /* gets p */
> if (p) /* non-NULL */
>   q = p->foo; /* might not be 1 */
> 
> Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> accesses can be misordered.  You need rcu_dereference() or the combination
> of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> more info.
> 
> So no, there is no guarantee.  I am assuming that the lock grabbed by
> CPU1 guards all assignments to "global", otherwise the code needs further
> help.  I am further assuming that the memory pointed to by CPU1's "p"
> is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> Otherwise, the assignment "p->foo = 1" is questionable.  And finally,
> I am assuming that p->foo stays constant once it has been made
> accessible to readers.
> 
> But the following should work:
> 
>   CPU1:
> p->foo = 1;  /* Assumes p is local. */
> smp_mb__before_spinlock();
> grab lock
> if (!global)  /* Assumes lock protects all assignments to global. */
> global = p;
> 
>   CPU2:
> p = rcu_dereference(global);
> if (p)
>q = p->foo; /* Assumes p->foo constant once visible to readers. */
>/* Also assumes p and q are local. */
> 
> If the assumptions called out in the comments do not hold, you at least
> need ACCESS_ONCE(), and perhaps even more synchronization.  For more info
> on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.

First of all, this "p->foo = 1" is a shorthand for initialization of
struct file done by opening a file.  What you are saying is that it
can leak past the point where we stick a pointer to freshly opened
file into a place where another CPU can see it, but not past the
barrier in dropping the lock, right?

And you are suggesting rcu_dereference() as a way to bring the required
barriers in.  Ouch...  The names are really bad, but there's another
fun issue - rcu_dereference brings in sparse noise.  Wouldn't direct use
of smp_read_barrier_depends() be cleaner, anyway?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Paul E. McKenney
On Mon, Oct 27, 2014 at 09:06:54AM +0100, Miklos Szeredi wrote:
> [Paul McKenney added to CC]
> 
> On Sat, Oct 25, 2014 at 7:06 PM, Al Viro  wrote:
> > On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
> >
> >> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> >> protects against), but race of two fsync calls with each other.  If
> >> there's no synchronization between them, then that od->upperfile does
> >> indeed count as lockless access, no matter that the assignment was
> >> done under lock.
> >
> > p = global;
> > if (!p) {   // outside of lock
> > p = alloc();
> > grab lock
> > if (!global) {
> > global = p;
> > } else {
> > destroy(p);
> > p = global;
> > }
> > drop lock
> > }
> > is a very common pattern, especially if you look for cases when lock is
> > a spinlock and allocation is blocking (in those cases you'll often see
> > destroy() part done after dropping the lock; that's where what I fucked up 
> > in
> > what I'd originally pushed.  And it wasn't even needed - fput() under
> > ->i_mutex is OK...)
> 
> Being a very common pattern does not automatically make it correct...
> 
> My understanding of these issues is very limited, but it's not clear
> to me what will order initialization of members of p with the storing
> of p into global.  E.g. we start out with global == NULL and p->foo ==
> 0.
> 
> CPU1:
>   p->foo = 1
>   grab lock
>   if (!global)
>   global = p
> 
> CPU1:

If it is all the same to you, I will call this CPU2 to distinguish it from
the first CPU1.  ;-)

>   p = global
>   if (p)
>  q = p->foo
> 
> Is it guaranteed that the above sequence (as is, without any barriers
> or ACCESS_ONCE() other than the lock acquisition) will result in q ==
> 1 if p != NULL?

Indeed, life is hard here.  Keep in mind that lock acquisition is not
guaranteed to prevent prior operations from being reordered into the
critical section, possibly as follows:

CPU1:
  grab lock
  if (!global)
  global = p;
  /* Assume all of CPU2's accesses happen here. */
  p->foo = 1;

This clearly allows CPU2 to execute as follows:

CPU2:
  p = global; /* gets p */
  if (p) /* non-NULL */
q = p->foo; /* might not be 1 */

Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
accesses can be misordered.  You need rcu_dereference() or the combination
of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
more info.

So no, there is no guarantee.  I am assuming that the lock grabbed by
CPU1 guards all assignments to "global", otherwise the code needs further
help.  I am further assuming that the memory pointed to by CPU1's "p"
is inaccessible to any other CPU, as in CPU1 just allocated the memory.
Otherwise, the assignment "p->foo = 1" is questionable.  And finally,
I am assuming that p->foo stays constant once it has been made
accessible to readers.

But the following should work:

CPU1:
  p->foo = 1;  /* Assumes p is local. */
  smp_mb__before_spinlock();
  grab lock
  if (!global)  /* Assumes lock protects all assignments to global. */
  global = p;

CPU2:
  p = rcu_dereference(global);
  if (p)
 q = p->foo; /* Assumes p->foo constant once visible to readers. */
 /* Also assumes p and q are local. */

If the assumptions called out in the comments do not hold, you at least
need ACCESS_ONCE(), and perhaps even more synchronization.  For more info
on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Miklos Szeredi
[Paul McKenney added to CC]

On Sat, Oct 25, 2014 at 7:06 PM, Al Viro  wrote:
> On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
>
>> Yes, but it's not about race with copy-up (which the ovl_path_upper()
>> protects against), but race of two fsync calls with each other.  If
>> there's no synchronization between them, then that od->upperfile does
>> indeed count as lockless access, no matter that the assignment was
>> done under lock.
>
> p = global;
> if (!p) {   // outside of lock
> p = alloc();
> grab lock
> if (!global) {
> global = p;
> } else {
> destroy(p);
> p = global;
> }
> drop lock
> }
> is a very common pattern, especially if you look for cases when lock is
> a spinlock and allocation is blocking (in those cases you'll often see
> destroy() part done after dropping the lock; that's where what I fucked up in
> what I'd originally pushed.  And it wasn't even needed - fput() under
> ->i_mutex is OK...)

Being a very common pattern does not automatically make it correct...

My understanding of these issues is very limited, but it's not clear
to me what will order initialization of members of p with the storing
of p into global.  E.g. we start out with global == NULL and p->foo ==
0.

CPU1:
  p->foo = 1
  grab lock
  if (!global)
  global = p

CPU1:
  p = global
  if (p)
 q = p->foo

Is it guaranteed that the above sequence (as is, without any barriers
or ACCESS_ONCE() other than the lock acquisition) will result in q ==
1 if p != NULL?

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Miklos Szeredi
[Paul McKenney added to CC]

On Sat, Oct 25, 2014 at 7:06 PM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:

 Yes, but it's not about race with copy-up (which the ovl_path_upper()
 protects against), but race of two fsync calls with each other.  If
 there's no synchronization between them, then that od-upperfile does
 indeed count as lockless access, no matter that the assignment was
 done under lock.

 p = global;
 if (!p) {   // outside of lock
 p = alloc();
 grab lock
 if (!global) {
 global = p;
 } else {
 destroy(p);
 p = global;
 }
 drop lock
 }
 is a very common pattern, especially if you look for cases when lock is
 a spinlock and allocation is blocking (in those cases you'll often see
 destroy() part done after dropping the lock; that's where what I fucked up in
 what I'd originally pushed.  And it wasn't even needed - fput() under
 -i_mutex is OK...)

Being a very common pattern does not automatically make it correct...

My understanding of these issues is very limited, but it's not clear
to me what will order initialization of members of p with the storing
of p into global.  E.g. we start out with global == NULL and p-foo ==
0.

CPU1:
  p-foo = 1
  grab lock
  if (!global)
  global = p

CPU1:
  p = global
  if (p)
 q = p-foo

Is it guaranteed that the above sequence (as is, without any barriers
or ACCESS_ONCE() other than the lock acquisition) will result in q ==
1 if p != NULL?

Thanks,
Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Paul E. McKenney
On Mon, Oct 27, 2014 at 09:06:54AM +0100, Miklos Szeredi wrote:
 [Paul McKenney added to CC]
 
 On Sat, Oct 25, 2014 at 7:06 PM, Al Viro v...@zeniv.linux.org.uk wrote:
  On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
 
  Yes, but it's not about race with copy-up (which the ovl_path_upper()
  protects against), but race of two fsync calls with each other.  If
  there's no synchronization between them, then that od-upperfile does
  indeed count as lockless access, no matter that the assignment was
  done under lock.
 
  p = global;
  if (!p) {   // outside of lock
  p = alloc();
  grab lock
  if (!global) {
  global = p;
  } else {
  destroy(p);
  p = global;
  }
  drop lock
  }
  is a very common pattern, especially if you look for cases when lock is
  a spinlock and allocation is blocking (in those cases you'll often see
  destroy() part done after dropping the lock; that's where what I fucked up 
  in
  what I'd originally pushed.  And it wasn't even needed - fput() under
  -i_mutex is OK...)
 
 Being a very common pattern does not automatically make it correct...
 
 My understanding of these issues is very limited, but it's not clear
 to me what will order initialization of members of p with the storing
 of p into global.  E.g. we start out with global == NULL and p-foo ==
 0.
 
 CPU1:
   p-foo = 1
   grab lock
   if (!global)
   global = p
 
 CPU1:

If it is all the same to you, I will call this CPU2 to distinguish it from
the first CPU1.  ;-)

   p = global
   if (p)
  q = p-foo
 
 Is it guaranteed that the above sequence (as is, without any barriers
 or ACCESS_ONCE() other than the lock acquisition) will result in q ==
 1 if p != NULL?

Indeed, life is hard here.  Keep in mind that lock acquisition is not
guaranteed to prevent prior operations from being reordered into the
critical section, possibly as follows:

CPU1:
  grab lock
  if (!global)
  global = p;
  /* Assume all of CPU2's accesses happen here. */
  p-foo = 1;

This clearly allows CPU2 to execute as follows:

CPU2:
  p = global; /* gets p */
  if (p) /* non-NULL */
q = p-foo; /* might not be 1 */

Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
accesses can be misordered.  You need rcu_dereference() or the combination
of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
more info.

So no, there is no guarantee.  I am assuming that the lock grabbed by
CPU1 guards all assignments to global, otherwise the code needs further
help.  I am further assuming that the memory pointed to by CPU1's p
is inaccessible to any other CPU, as in CPU1 just allocated the memory.
Otherwise, the assignment p-foo = 1 is questionable.  And finally,
I am assuming that p-foo stays constant once it has been made
accessible to readers.

But the following should work:

CPU1:
  p-foo = 1;  /* Assumes p is local. */
  smp_mb__before_spinlock();
  grab lock
  if (!global)  /* Assumes lock protects all assignments to global. */
  global = p;

CPU2:
  p = rcu_dereference(global);
  if (p)
 q = p-foo; /* Assumes p-foo constant once visible to readers. */
 /* Also assumes p and q are local. */

If the assumptions called out in the comments do not hold, you at least
need ACCESS_ONCE(), and perhaps even more synchronization.  For more info
on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Al Viro
On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:

 Indeed, life is hard here.  Keep in mind that lock acquisition is not
 guaranteed to prevent prior operations from being reordered into the
 critical section, possibly as follows:
 
   CPU1:
 grab lock
 if (!global)
 global = p;
 /* Assume all of CPU2's accesses happen here. */
 p-foo = 1;

A bit of context: p is a local pointer to struct file here and alloc is
opening it...

 This clearly allows CPU2 to execute as follows:
 
   CPU2:
 p = global; /* gets p */
 if (p) /* non-NULL */
   q = p-foo; /* might not be 1 */
 
 Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
 accesses can be misordered.  You need rcu_dereference() or the combination
 of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
 As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
 more info.
 
 So no, there is no guarantee.  I am assuming that the lock grabbed by
 CPU1 guards all assignments to global, otherwise the code needs further
 help.  I am further assuming that the memory pointed to by CPU1's p
 is inaccessible to any other CPU, as in CPU1 just allocated the memory.
 Otherwise, the assignment p-foo = 1 is questionable.  And finally,
 I am assuming that p-foo stays constant once it has been made
 accessible to readers.
 
 But the following should work:
 
   CPU1:
 p-foo = 1;  /* Assumes p is local. */
 smp_mb__before_spinlock();
 grab lock
 if (!global)  /* Assumes lock protects all assignments to global. */
 global = p;
 
   CPU2:
 p = rcu_dereference(global);
 if (p)
q = p-foo; /* Assumes p-foo constant once visible to readers. */
/* Also assumes p and q are local. */
 
 If the assumptions called out in the comments do not hold, you at least
 need ACCESS_ONCE(), and perhaps even more synchronization.  For more info
 on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.

First of all, this p-foo = 1 is a shorthand for initialization of
struct file done by opening a file.  What you are saying is that it
can leak past the point where we stick a pointer to freshly opened
file into a place where another CPU can see it, but not past the
barrier in dropping the lock, right?

And you are suggesting rcu_dereference() as a way to bring the required
barriers in.  Ouch...  The names are really bad, but there's another
fun issue - rcu_dereference brings in sparse noise.  Wouldn't direct use
of smp_read_barrier_depends() be cleaner, anyway?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Paul E. McKenney
On Mon, Oct 27, 2014 at 05:28:01PM +, Al Viro wrote:
 On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:
 
  Indeed, life is hard here.  Keep in mind that lock acquisition is not
  guaranteed to prevent prior operations from being reordered into the
  critical section, possibly as follows:
  
  CPU1:
grab lock
if (!global)
global = p;
/* Assume all of CPU2's accesses happen here. */
p-foo = 1;
 
 A bit of context: p is a local pointer to struct file here and alloc is
 opening it...

OK, good to know.  ;-)

  This clearly allows CPU2 to execute as follows:
  
  CPU2:
p = global; /* gets p */
if (p) /* non-NULL */
  q = p-foo; /* might not be 1 */
  
  Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
  accesses can be misordered.  You need rcu_dereference() or the combination
  of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
  As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
  more info.
  
  So no, there is no guarantee.  I am assuming that the lock grabbed by
  CPU1 guards all assignments to global, otherwise the code needs further
  help.  I am further assuming that the memory pointed to by CPU1's p
  is inaccessible to any other CPU, as in CPU1 just allocated the memory.
  Otherwise, the assignment p-foo = 1 is questionable.  And finally,
  I am assuming that p-foo stays constant once it has been made
  accessible to readers.
  
  But the following should work:
  
  CPU1:
p-foo = 1;  /* Assumes p is local. */
smp_mb__before_spinlock();
grab lock
if (!global)  /* Assumes lock protects all assignments to global. */
global = p;
  
  CPU2:
p = rcu_dereference(global);
if (p)
   q = p-foo; /* Assumes p-foo constant once visible to readers. */
   /* Also assumes p and q are local. */
  
  If the assumptions called out in the comments do not hold, you at least
  need ACCESS_ONCE(), and perhaps even more synchronization.  For more info
  on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
 
 First of all, this p-foo = 1 is a shorthand for initialization of
 struct file done by opening a file.  What you are saying is that it
 can leak past the point where we stick a pointer to freshly opened
 file into a place where another CPU can see it, but not past the
 barrier in dropping the lock, right?

Exactly!

I should also add that smp_mb__before_spinlock() implies smp_wmb(), but
nothing more.  But that is OK because smp_wmb() suffices in this case.

 And you are suggesting rcu_dereference() as a way to bring the required
 barriers in.  Ouch...  The names are really bad, but there's another
 fun issue - rcu_dereference brings in sparse noise.  Wouldn't direct use
 of smp_read_barrier_depends() be cleaner, anyway?

Code making direct use of smp_read_barrier_depends() is harder to read,
in my experience, but good point on the sparse noise.  Maybe a new
lockless_dereference() primitive?  Maybe something like the following?
(Untested, probably does not even build.)

#define lockless_dereference(p) \
({ \
typeof(*p) *_p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
_p1; \
})

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Al Viro
On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
 Code making direct use of smp_read_barrier_depends() is harder to read,
 in my experience, but good point on the sparse noise.  Maybe a new
 lockless_dereference() primitive?  Maybe something like the following?
 (Untested, probably does not even build.)
 
 #define lockless_dereference(p) \
 ({ \
   typeof(*p) *_p1 = ACCESS_ONCE(p); \
   smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
   _p1; \
 })

Hmm...  Where would you prefer to put it?  rcupdate.h?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-27 Thread Paul E. McKenney
On Tue, Oct 28, 2014 at 01:12:14AM +, Al Viro wrote:
 On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
  Code making direct use of smp_read_barrier_depends() is harder to read,
  in my experience, but good point on the sparse noise.  Maybe a new
  lockless_dereference() primitive?  Maybe something like the following?
  (Untested, probably does not even build.)
  
  #define lockless_dereference(p) \
  ({ \
  typeof(*p) *_p1 = ACCESS_ONCE(p); \
  smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
  _p1; \
  })
 
 Hmm...  Where would you prefer to put it?  rcupdate.h?

Good a place as any, I guess.  Please see patch below.  Left to myself,
I would send this along for the next merge window, but please let me
know if you would like it sooner.

Thanx, Paul



rcu: Provide counterpart to rcu_dereference() for non-RCU situations

Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying.  This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.

Reported-by: Al Viro v...@zeniv.linux.org.uk
Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9575c2d403b5..ed4f5939a452 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -617,6 +617,21 @@ static inline void rcu_preempt_sleep_check(void)
 #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
 
 /**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU.  That
+ * something other might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+   typeof(p) _p1 = ACCESS_ONCE(p); \
+   smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+   (_p1); \
+})
+
+/**
  * rcu_assign_pointer() - assign to RCU-protected pointer
  * @p: pointer to assign to
  * @v: value to assign (publish)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-25 Thread Al Viro
On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:

> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> protects against), but race of two fsync calls with each other.  If
> there's no synchronization between them, then that od->upperfile does
> indeed count as lockless access, no matter that the assignment was
> done under lock.

p = global;
if (!p) {   // outside of lock
p = alloc();
grab lock
if (!global) {
global = p;
} else {
destroy(p);
p = global;
}
drop lock
}
is a very common pattern, especially if you look for cases when lock is
a spinlock and allocation is blocking (in those cases you'll often see
destroy() part done after dropping the lock; that's where what I fucked up in
what I'd originally pushed.  And it wasn't even needed - fput() under
->i_mutex is OK...)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-25 Thread Al Viro
On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:

> The reason I didn't do your "fix" is that it
> 
>  - adds more lines than it takes,
> 
>  - I wasn't sure at all if the lockless access is actually correct
> without the ACCESS_ONCE and all the memory barrier magic that might be
>  necessary on weird architectures.

_What_ lockless accesses?  There is an extremely embarrassing bug in that
commit, all right, but it has nothing to do with barriers...  All
barrier-related issues are taken care of by ovl_path_upper() (and without
that you'd have tons of worse problems).  Fetching ->upperfile outside of
->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
grab ->i_mutex and find out that it has already been taken care of.
In which case we fput() what we'd opened and move on (fput() under
->i_mutex is fine - it's going to be delayed until return from syscall
anyway).

There was a very dumb braino in there; fixed, force-pushed, passes unionmount
tests, no regressions on LTP syscall ones and xfstests.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-25 Thread Miklos Szeredi
On Sat, Oct 25, 2014 at 10:18 AM, Al Viro  wrote:
> On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:
>
>> The reason I didn't do your "fix" is that it
>>
>>  - adds more lines than it takes,
>>
>>  - I wasn't sure at all if the lockless access is actually correct
>> without the ACCESS_ONCE and all the memory barrier magic that might be
>>  necessary on weird architectures.
>
> _What_ lockless accesses?  There is an extremely embarrassing bug in that
> commit, all right, but it has nothing to do with barriers...  All
> barrier-related issues are taken care of by ovl_path_upper() (and without
> that you'd have tons of worse problems).  Fetching ->upperfile outside of
> ->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
> grab ->i_mutex and find out that it has already been taken care of.
> In which case we fput() what we'd opened and move on (fput() under
> ->i_mutex is fine - it's going to be delayed until return from syscall
> anyway).

Yes, but it's not about race with copy-up (which the ovl_path_upper()
protects against), but race of two fsync calls with each other.  If
there's no synchronization between them, then that od->upperfile does
indeed count as lockless access, no matter that the assignment was
done under lock.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-25 Thread Miklos Szeredi
On Sat, Oct 25, 2014 at 10:18 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:

 The reason I didn't do your fix is that it

  - adds more lines than it takes,

  - I wasn't sure at all if the lockless access is actually correct
 without the ACCESS_ONCE and all the memory barrier magic that might be
  necessary on weird architectures.

 _What_ lockless accesses?  There is an extremely embarrassing bug in that
 commit, all right, but it has nothing to do with barriers...  All
 barrier-related issues are taken care of by ovl_path_upper() (and without
 that you'd have tons of worse problems).  Fetching -upperfile outside of
 -i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
 grab -i_mutex and find out that it has already been taken care of.
 In which case we fput() what we'd opened and move on (fput() under
 -i_mutex is fine - it's going to be delayed until return from syscall
 anyway).

Yes, but it's not about race with copy-up (which the ovl_path_upper()
protects against), but race of two fsync calls with each other.  If
there's no synchronization between them, then that od-upperfile does
indeed count as lockless access, no matter that the assignment was
done under lock.

Thanks,
Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-25 Thread Al Viro
On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:

 The reason I didn't do your fix is that it
 
  - adds more lines than it takes,
 
  - I wasn't sure at all if the lockless access is actually correct
 without the ACCESS_ONCE and all the memory barrier magic that might be
  necessary on weird architectures.

_What_ lockless accesses?  There is an extremely embarrassing bug in that
commit, all right, but it has nothing to do with barriers...  All
barrier-related issues are taken care of by ovl_path_upper() (and without
that you'd have tons of worse problems).  Fetching -upperfile outside of
-i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
grab -i_mutex and find out that it has already been taken care of.
In which case we fput() what we'd opened and move on (fput() under
-i_mutex is fine - it's going to be delayed until return from syscall
anyway).

There was a very dumb braino in there; fixed, force-pushed, passes unionmount
tests, no regressions on LTP syscall ones and xfstests.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-25 Thread Al Viro
On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:

 Yes, but it's not about race with copy-up (which the ovl_path_upper()
 protects against), but race of two fsync calls with each other.  If
 there's no synchronization between them, then that od-upperfile does
 indeed count as lockless access, no matter that the assignment was
 done under lock.

p = global;
if (!p) {   // outside of lock
p = alloc();
grab lock
if (!global) {
global = p;
} else {
destroy(p);
p = global;
}
drop lock
}
is a very common pattern, especially if you look for cases when lock is
a spinlock and allocation is blocking (in those cases you'll often see
destroy() part done after dropping the lock; that's where what I fucked up in
what I'd originally pushed.  And it wasn't even needed - fput() under
-i_mutex is OK...)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-24 Thread Geert Uytterhoeven
Hi Al,

On Fri, Oct 24, 2014 at 5:24 AM, Al Viro  wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory?  All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...
>>
>> Oh, well - that goes under "weird pessimisations, easy to fix in 
>> followups"...
>
> OK, pulled into vfs.git, followups in question added.  Also there: fix for
> a long-standing leak in d_splice_alias() failure exits.  Guys, could you
> check that current vfs.git#for-linus survives your local tests?  Seems to
> survive here; if I don't hear of any problems by tomorrow morning, to Linus
> it goes...  FWIW, for that pull request stats would be

I think you forgot to merge your for-linus branch into for-next?

Anyway, as Stephen announced there will be no linux-next release
today, I'm afraid you'll have to delay your pull request to Tuesday.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-24 Thread Miklos Szeredi
On Fri, Oct 24, 2014 at 5:24 AM, Al Viro  wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory?  All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...

The reason I didn't do your "fix" is that it

 - adds more lines than it takes,

 - I wasn't sure at all if the lockless access is actually correct
without the ACCESS_ONCE and all the memory barrier magic that might be
necessary on weird architectures.

The rest of the changes look OK.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-24 Thread Miklos Szeredi
On Fri, Oct 24, 2014 at 5:24 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
 Why the hell do you hold -i_mutex across the entire opening of underlying
 directory?  All you need is to serialize one assignment; the side that loses
 the race will simply fput() what it opened...

The reason I didn't do your fix is that it

 - adds more lines than it takes,

 - I wasn't sure at all if the lockless access is actually correct
without the ACCESS_ONCE and all the memory barrier magic that might be
necessary on weird architectures.

The rest of the changes look OK.

Thanks,
Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-24 Thread Geert Uytterhoeven
Hi Al,

On Fri, Oct 24, 2014 at 5:24 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
 Why the hell do you hold -i_mutex across the entire opening of underlying
 directory?  All you need is to serialize one assignment; the side that loses
 the race will simply fput() what it opened...

 Oh, well - that goes under weird pessimisations, easy to fix in 
 followups...

 OK, pulled into vfs.git, followups in question added.  Also there: fix for
 a long-standing leak in d_splice_alias() failure exits.  Guys, could you
 check that current vfs.git#for-linus survives your local tests?  Seems to
 survive here; if I don't hear of any problems by tomorrow morning, to Linus
 it goes...  FWIW, for that pull request stats would be

I think you forgot to merge your for-linus branch into for-next?

Anyway, as Stephen announced there will be no linux-next release
today, I'm afraid you'll have to delay your pull request to Tuesday.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-23 Thread Al Viro
On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
> Why the hell do you hold ->i_mutex across the entire opening of underlying
> directory?  All you need is to serialize one assignment; the side that loses
> the race will simply fput() what it opened...
> 
> Oh, well - that goes under "weird pessimisations, easy to fix in followups"...

OK, pulled into vfs.git, followups in question added.  Also there: fix for
a long-standing leak in d_splice_alias() failure exits.  Guys, could you
check that current vfs.git#for-linus survives your local tests?  Seems to
survive here; if I don't hear of any problems by tomorrow morning, to Linus
it goes...  FWIW, for that pull request stats would be

Shortlog:
Al Viro (5):
  fix inode leaks on d_splice_alias() failure exits
  overlayfs: don't hold ->i_mutex over opening the real directory
  overlayfs: make ovl_cache_entry->name an array instead of pointer
  overlayfs: embed root into overlay_readdir_data
  overlayfs: embed middle into overlay_readdir_data

Andy Whitcroft (1):
  overlayfs: add statfs support

Erez Zadok (1):
  overlayfs: implement show_options

Miklos Szeredi (11):
  vfs: add i_op->dentry_open()
  vfs: export do_splice_direct() to modules
  vfs: export __inode_permission() to modules
  vfs: introduce clone_private_mount()
  vfs: export check_sticky()
  vfs: add whiteout support
  vfs: add RENAME_WHITEOUT
  ext4: support RENAME_WHITEOUT
  shmem: support RENAME_WHITEOUT
  overlay filesystem
  fs: limit filesystem stacking depth

Neil Brown (1):
  overlay: overlay filesystem documentation

Diffstat:
 Documentation/filesystems/Locking   |2 +
 Documentation/filesystems/overlayfs.txt |  198 +++
 Documentation/filesystems/vfs.txt   |7 +
 MAINTAINERS |7 +
 fs/Kconfig  |1 +
 fs/Makefile |1 +
 fs/btrfs/ioctl.c|   20 +-
 fs/dcache.c |2 +
 fs/ecryptfs/main.c  |7 +
 fs/ext4/namei.c |   95 +++-
 fs/internal.h   |7 -
 fs/namei.c  |   41 +-
 fs/namespace.c  |   27 +
 fs/open.c   |   23 +-
 fs/overlayfs/Kconfig|   10 +
 fs/overlayfs/Makefile   |7 +
 fs/overlayfs/copy_up.c  |  414 ++
 fs/overlayfs/dir.c  |  921 +++
 fs/overlayfs/inode.c|  425 ++
 fs/overlayfs/overlayfs.h|  191 +++
 fs/overlayfs/readdir.c  |  589 
 fs/overlayfs/super.c|  796 ++
 fs/splice.c |1 +
 include/linux/fs.h  |   39 ++
 include/linux/mount.h   |3 +
 include/uapi/linux/fs.h |1 +
 mm/shmem.c  |   36 +-
 27 files changed, 3813 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-23 Thread Al Viro
On Fri, Oct 24, 2014 at 01:25:39AM +0200, Miklos Szeredi wrote:
> Linus,
> 
> Please pull
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25
> 
> Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
> review and is now officially be BugFree(TM).  Also passes the
> unionmount-testsuite by David Howells.

*blink*

Why the hell do you hold ->i_mutex across the entire opening of underlying
directory?  All you need is to serialize one assignment; the side that loses
the race will simply fput() what it opened...

Oh, well - that goes under "weird pessimisations, easy to fix in followups"...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] overlay filesystem v25

2014-10-23 Thread Miklos Szeredi
Linus,

Please pull

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25

Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
review and is now officially be BugFree(TM).  Also passes the
unionmount-testsuite by David Howells.

Thanks,
Miklos

---
Andy Whitcroft (1):
  overlayfs: add statfs support

Erez Zadok (1):
  overlayfs: implement show_options

Miklos Szeredi (11):
  vfs: add i_op->dentry_open()
  vfs: export do_splice_direct() to modules
  vfs: export __inode_permission() to modules
  vfs: introduce clone_private_mount()
  vfs: export check_sticky()
  vfs: add whiteout support
  vfs: add RENAME_WHITEOUT
  ext4: support RENAME_WHITEOUT
  shmem: support RENAME_WHITEOUT
  overlay filesystem
  fs: limit filesystem stacking depth

Neil Brown (1):
  overlay: overlay filesystem documentation

---
 Documentation/filesystems/Locking   |   2 +
 Documentation/filesystems/overlayfs.txt | 198 +++
 Documentation/filesystems/vfs.txt   |   7 +
 MAINTAINERS |   7 +
 fs/Kconfig  |   1 +
 fs/Makefile |   1 +
 fs/btrfs/ioctl.c|  20 +-
 fs/ecryptfs/main.c  |   7 +
 fs/ext4/namei.c |  95 +++-
 fs/internal.h   |   7 -
 fs/namei.c  |  41 +-
 fs/namespace.c  |  27 +
 fs/open.c   |  23 +-
 fs/overlayfs/Kconfig|  10 +
 fs/overlayfs/Makefile   |   7 +
 fs/overlayfs/copy_up.c  | 414 ++
 fs/overlayfs/dir.c  | 921 
 fs/overlayfs/inode.c| 425 +++
 fs/overlayfs/overlayfs.h| 191 +++
 fs/overlayfs/readdir.c  | 587 
 fs/overlayfs/super.c| 796 +++
 fs/splice.c |   1 +
 include/linux/fs.h  |  39 ++
 include/linux/mount.h   |   3 +
 include/uapi/linux/fs.h |   1 +
 mm/shmem.c  |  36 +-
 26 files changed, 3809 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] overlay filesystem v25

2014-10-23 Thread Miklos Szeredi
Linus,

Please pull

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25

Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
review and is now officially be BugFree(TM).  Also passes the
unionmount-testsuite by David Howells.

Thanks,
Miklos

---
Andy Whitcroft (1):
  overlayfs: add statfs support

Erez Zadok (1):
  overlayfs: implement show_options

Miklos Szeredi (11):
  vfs: add i_op-dentry_open()
  vfs: export do_splice_direct() to modules
  vfs: export __inode_permission() to modules
  vfs: introduce clone_private_mount()
  vfs: export check_sticky()
  vfs: add whiteout support
  vfs: add RENAME_WHITEOUT
  ext4: support RENAME_WHITEOUT
  shmem: support RENAME_WHITEOUT
  overlay filesystem
  fs: limit filesystem stacking depth

Neil Brown (1):
  overlay: overlay filesystem documentation

---
 Documentation/filesystems/Locking   |   2 +
 Documentation/filesystems/overlayfs.txt | 198 +++
 Documentation/filesystems/vfs.txt   |   7 +
 MAINTAINERS |   7 +
 fs/Kconfig  |   1 +
 fs/Makefile |   1 +
 fs/btrfs/ioctl.c|  20 +-
 fs/ecryptfs/main.c  |   7 +
 fs/ext4/namei.c |  95 +++-
 fs/internal.h   |   7 -
 fs/namei.c  |  41 +-
 fs/namespace.c  |  27 +
 fs/open.c   |  23 +-
 fs/overlayfs/Kconfig|  10 +
 fs/overlayfs/Makefile   |   7 +
 fs/overlayfs/copy_up.c  | 414 ++
 fs/overlayfs/dir.c  | 921 
 fs/overlayfs/inode.c| 425 +++
 fs/overlayfs/overlayfs.h| 191 +++
 fs/overlayfs/readdir.c  | 587 
 fs/overlayfs/super.c| 796 +++
 fs/splice.c |   1 +
 include/linux/fs.h  |  39 ++
 include/linux/mount.h   |   3 +
 include/uapi/linux/fs.h |   1 +
 mm/shmem.c  |  36 +-
 26 files changed, 3809 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-23 Thread Al Viro
On Fri, Oct 24, 2014 at 01:25:39AM +0200, Miklos Szeredi wrote:
 Linus,
 
 Please pull
 
   git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25
 
 Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
 review and is now officially be BugFree(TM).  Also passes the
 unionmount-testsuite by David Howells.

*blink*

Why the hell do you hold -i_mutex across the entire opening of underlying
directory?  All you need is to serialize one assignment; the side that loses
the race will simply fput() what it opened...

Oh, well - that goes under weird pessimisations, easy to fix in followups...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] overlay filesystem v25

2014-10-23 Thread Al Viro
On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
 Why the hell do you hold -i_mutex across the entire opening of underlying
 directory?  All you need is to serialize one assignment; the side that loses
 the race will simply fput() what it opened...
 
 Oh, well - that goes under weird pessimisations, easy to fix in followups...

OK, pulled into vfs.git, followups in question added.  Also there: fix for
a long-standing leak in d_splice_alias() failure exits.  Guys, could you
check that current vfs.git#for-linus survives your local tests?  Seems to
survive here; if I don't hear of any problems by tomorrow morning, to Linus
it goes...  FWIW, for that pull request stats would be

Shortlog:
Al Viro (5):
  fix inode leaks on d_splice_alias() failure exits
  overlayfs: don't hold -i_mutex over opening the real directory
  overlayfs: make ovl_cache_entry-name an array instead of pointer
  overlayfs: embed root into overlay_readdir_data
  overlayfs: embed middle into overlay_readdir_data

Andy Whitcroft (1):
  overlayfs: add statfs support

Erez Zadok (1):
  overlayfs: implement show_options

Miklos Szeredi (11):
  vfs: add i_op-dentry_open()
  vfs: export do_splice_direct() to modules
  vfs: export __inode_permission() to modules
  vfs: introduce clone_private_mount()
  vfs: export check_sticky()
  vfs: add whiteout support
  vfs: add RENAME_WHITEOUT
  ext4: support RENAME_WHITEOUT
  shmem: support RENAME_WHITEOUT
  overlay filesystem
  fs: limit filesystem stacking depth

Neil Brown (1):
  overlay: overlay filesystem documentation

Diffstat:
 Documentation/filesystems/Locking   |2 +
 Documentation/filesystems/overlayfs.txt |  198 +++
 Documentation/filesystems/vfs.txt   |7 +
 MAINTAINERS |7 +
 fs/Kconfig  |1 +
 fs/Makefile |1 +
 fs/btrfs/ioctl.c|   20 +-
 fs/dcache.c |2 +
 fs/ecryptfs/main.c  |7 +
 fs/ext4/namei.c |   95 +++-
 fs/internal.h   |7 -
 fs/namei.c  |   41 +-
 fs/namespace.c  |   27 +
 fs/open.c   |   23 +-
 fs/overlayfs/Kconfig|   10 +
 fs/overlayfs/Makefile   |7 +
 fs/overlayfs/copy_up.c  |  414 ++
 fs/overlayfs/dir.c  |  921 +++
 fs/overlayfs/inode.c|  425 ++
 fs/overlayfs/overlayfs.h|  191 +++
 fs/overlayfs/readdir.c  |  589 
 fs/overlayfs/super.c|  796 ++
 fs/splice.c |1 +
 include/linux/fs.h  |   39 ++
 include/linux/mount.h   |3 +
 include/uapi/linux/fs.h |1 +
 mm/shmem.c  |   36 +-
 27 files changed, 3813 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/