On Mon, Sep 25, 2017 at 11:13:47PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 06:59 +0800, Ming Lei wrote:
> > On Mon, Sep 25, 2017 at 01:29:24PM -0700, Bart Van Assche wrote:
> > > +int blk_queue_enter(struct request_queue *q, bool nowait, bool preempt)
> > > {
> > > while (true) {
> > > int ret;
> > >
> > > - if (percpu_ref_tryget_live(&q->q_usage_counter))
> > > - return 0;
> > > + if (percpu_ref_tryget_live(&q->q_usage_counter)) {
> > > + /*
> > > + * Since setting the PREEMPT_ONLY flag is followed
> > > + * by a switch of q_usage_counter from per-cpu to
> > > + * atomic mode and back to per-cpu and since the
> > > + * switch to atomic mode uses call_rcu_sched(), it
> > > + * is not necessary to call smp_rmb() here.
> > > + */
> >
> > rcu_read_lock is held only inside percpu_ref_tryget_live().
> >
> > Without one explicit barrier(smp_mb) between getting the refcounter
> > and reading the preempt only flag, the two operations(writing to
> > refcounter and reading the flag) can be reordered, so
> > unfreeze/unfreeze may be completed before this IO is completed.
>
> Sorry but I disagree. I'm using RCU to achieve the same effect as a barrier
> and to move the cost of the barrier from the reader to the updater. See also
> Paul E. McKenney, Mathieu Desnoyers, Lai Jiangshan, and Josh Triplett,
> The RCU-barrier menagerie, LWN.net, November 12, 2013
> (https://lwn.net/Articles/573497/).
Let me explain it in a bit details:
1) in SCSI quiesce path:
We can think there is one synchronize_rcu() in freeze/unfreeze.
Let's see the RCU document:
Documentation/RCU/whatisRCU.txt:
void synchronize_rcu(void);
Marks the end of updater code and the beginning of reclaimer
code. It does this by blocking until all pre-existing RCU
read-side critical sections on all CPUs have completed.
Note that synchronize_rcu() will -not- necessarily wait for
any subsequent RCU read-side critical sections to complete.
For example, consider the following sequence of events:
So synchronize_rcu() in SCSI quiesce path just waits for completion
of pre-existing read-side critical section, and subsequent RCU
read-side critical sections won't be waited.
2) in normal I/O path of blk_enter_queue()
- only rcu read lock is held in percpu_ref_tryget_live(), and the lock
is released when this helper returns.
- there isn't explicit barrier(smp_mb()) between percpu_ref_tryget_live()
and checking flag of preempt only, so writing to percpu_ref counter
and reading the preempt flag can be reordered as the following:
-- check flag of preempt only
current process preempt out now, and just at the exact
time, SCSI quiesce is run from another process, then
freeze/unfreeze is completed because no pre-exit read-side
critical sections, and the percpu_ref isn't held too.
finally this process is preeempt in, and try to grab one ref
and submit I/O after SCSI is quiesced(which shouldn't be
allowed)
-- percpu_ref_tryget_live()
--
Ming