Hi Oleg

On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote:
> On 08/28, Michal Hocko wrote:
> >
> > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> > > On 08/27, Michal Hocko wrote:
> > > >
> > > > --- a/Documentation/memory-barriers.txt
> > > > +++ b/Documentation/memory-barriers.txt
> > > > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task 
> > > > state is cleared, and so sits
> > > >             <general barrier>             STORE current->state
> > > >         LOAD event_indicated
> > > >
> > > > +Please note that wake_up_process is an exception here because it 
> > > > implies
> > > > +the write memory barrier unconditionally.
> > > > +
> > >
> > > I simply can't understand (can't even parse) this part of 
> > > memory-barriers.txt.
> >
> > Do you mean the added text or the example above it?
> 
> Both ;)
> 
> but note that "load from X might see 0" is true of course, and in this

By this, I think you actually means the example below the added text,
i.e. the example for "to repeat..", right?

I think that's not a good example, and actually that example explains
that the barriers in -wait_event()- rather than in -wake_up()- are
conditional.

I wrote a patch to make things more clear, hope that helps.

(Add Paul and Jonathan to CCed)

Regards,
Boqun

> sense wake_up_process() is not exception:
> 
>       X = 1;
>       wmb();  // unless I am totally confused this just adds more confusion
>       Y = 1;
>       wake_up_process(TASK);
> 
> vs TASK doing
> 
>       for (;;) {
>               set_current_state(...);
>               if (Y)
>                       break;
>               schedule();
>       }
> 
>       BUG_ON(X == 0)
> 
> is not correct, it can actually can hit the BUG_ON() above. However, if
> wake_up_process() actually wakes a sleeping TASK up, then it should also
> see X = 1. Even without wmb(), even if we do
> 
>       Y = 1;
>       X = 1;
>       wake_up_process(TASK);
> 

----------->8
Subject: Documentation: call out conditional barriers of wait_*() and wake_up*()

The memory barriers in some sleep and wakeup functions are conditional,
there are several situations that there is no barriers:

1.      If wait_event() and co. actually don't prepare to sleep, there
        may be no barrier in them.

2.      No matter whether a sleep occurs or not, there may be no barrier
        between a successful wait-condition checking(the result of which
        is true) in wait_event() and the following instructions.

3.      If wake_up() and co. actually wake up no one, there may be no
        write barrier in them.

However, the current version of memory-barriers.txt doesn't call these
out. Further more, the example which wanted to explain that write
barriers in wake_up*() are conditional fails its job. To see that,
consider a similar example:

        CPU 1                           CPU 2
        =============================== ===============================
        X = 1;
        smp_mb();
        Y = 1;                          wait_event(wq, Y == 1);
                                          load from Y sees 1, no memory barrier
        smp_wmb();
        wake_up();
                                        load from X might see 0

CPU 2's load from X might still be 0 even if we add a write barrier
explicitly, which means that even if wake_up() guarantees a write
barrier, the original example still doesn't have the desired order
guarantee. In fact, the original example can explains the conditionality
of barriers in wait_*() rather than wake_up*().

This patch makes things clear, by calling out that the barriers in
wait_*() and wake_up*() are conditional and giving exact examples to
explain that.

Signed-off-by: Boqun Feng <boqun.f...@gmail.com>
---

 Documentation/memory-barriers.txt | 81 +++++++++++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index eafa6a5..df69841 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:
 
 which therefore also imply a general memory barrier after setting the state.
 The whole sequence above is available in various canned forms, all of which
-interpolate the memory barrier in the right place:
+imply a general barrier if and only if a sleep is at least about to happen,
+i.e. prepare_to_wait*() is called.
 
        wait_event();
        wait_event_interruptible();
@@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
        wait_on_bit();
        wait_on_bit_lock();
 
+Further more, no barrier is guaranteed after the successful wait condition
+checkings, whose results are true, in wait_*() and before the instructions
+following wait_*().
 
 Secondly, code that performs a wake up normally follows something like this:
 
@@ -2009,21 +2013,6 @@ between the STORE to indicate the event and the STORE to 
set TASK_RUNNING:
            <general barrier>             STORE current->state
        LOAD event_indicated
 
-To repeat, this write memory barrier is present if and only if something
-is actually awakened.  To see this, consider the following sequence of
-events, where X and Y are both initially zero:
-
-       CPU 1                           CPU 2
-       =============================== ===============================
-       X = 1;                          STORE event_indicated
-       smp_mb();                       wake_up();
-       Y = 1;                          wait_event(wq, Y == 1);
-       wake_up();                        load from Y sees 1, no memory barrier
-                                       load from X might see 0
-
-In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed
-to see 1.
-
 The available waker functions include:
 
        complete();
@@ -2042,6 +2031,66 @@ The available waker functions include:
        wake_up_poll();
        wake_up_process();
 
+To repeat, barriers in wait_event(), wake_up() and co. are conditional, meaning
+they are present if and only if a sleep or a wakeup actually occurs. To see
+this, consider the following three examples.
+
+The first example is for the conditional barriers in wait_*(), say X and Y
+are both initially zero:
+
+       CPU 1                           CPU 2
+       =============================== ===============================
+                                       X = 1;
+                                       smp_mb();
+       wait_event(wq, Y == 1);         Y = 1;
+         load from Y sees 1            wake_up();
+         <no memory barrier>
+       load from X might see 0
+
+And even if a sleep and a wakeup really occurs, there might be no barrier
+between the last load from Y(which the makes wait condition checking succeed)
+and load from X, so that load from X might still see 0. The second example 
shows
+that, the code running on CPU 1 & 2 is the same with the first example, however
+the sequence of events are different, say X, Y and Z are all initially zero,
+and another task may be waiting in wait_event(wq, Z == 1) on CPU 4:
+
+       CPU 1                           CPU 2                           CPU 3
+       =============================== =============================== 
===============================
+       wait_event(wq, Y == 1);                                         Z = 1;
+         load from Y sees 0
+         prepare_to_wait_event();
+           <general barrier>
+         schedule();
+           <general barrier>
+                                                                       
wake_up();
+         prepare_to_wait_event();
+           <general barrier>
+                                       X = 1;
+                                       smp_mb();
+         load from Y sees 1            Y = 1;
+         <no memory barrier>           wake_up();
+       load from X might see 0
+
+So that, to guarantee that CPU 1's load from X is 1 in the two examples above,
+a read barrier must be added after wait_event() in the code running on CPU 1.
+
+The third example is for the conditional write barriers in wake_up*(), say
+X, Y and Z are all initially zero:
+
+       CPU 1                           CPU 2
+       =============================== ===============================
+                                       X = 1;
+       wait_event(wq, Y == 1);         Y = 1;
+         load from Y sees 1            wake_up();
+                                         <no memory barrier>
+                                       Z = 1;
+       load from Z sees 1
+       smp_rmb();
+       load from X might see 0
+
+In contrast, if a wakeup does occur, CPU 1's load from X would be guaranteed
+to see 1. To guarantee that CPU 1's load from X is 1, a write barrier must be
+added between store to X and store to Z in the code running on CPU 2.
 
 [!] Note that the memory barriers implied by the sleeper and the waker do _not_
 order multiple stores before the wake-up with respect to loads of those stored

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to