Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-13 Thread Benjamin Block
On Wed, Apr 12, 2017 at 06:11:25PM +, Bart Van Assche wrote:
> On Wed, 2017-04-12 at 12:55 +0200, Benjamin Block wrote:
> > On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote:
> > > The six patches in this patch series fix the queue lockup I reported
> > > recently on the linux-block mailing list. Please consider these patches
> > > for inclusion in the upstream kernel.
> >
> > just out of curiosity. Is this maybe related to similar stuff happening
> > when CPUs are hot plugged - at least in that the stack gets stuck? Like
> > in this thread here:
> > https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html
> >
> > Would be interesting, because we recently saw similar stuff happening.
>
> Hello Benjamin,
>
> My proposal is to repeat that test with Jens' for-next branch. If the issue
> still occurs with that tree then please check the contents of
> /sys/kernel/debug/block/*/mq/*/{dispatch,*/rq_list}. That will allow to
> determine whether or not any block layer requests are still pending. If
> running the command below resolves the deadlock then it means that a
> trigger to run a block layer queue is still missing somewhere:
>
> for a in /sys/kernel/debug/block/*/mq/state; do echo run >$a; done
>
> See also git://git.kernel.dk/linux-block.git.
>

Thx for the hint! I'll forward that and see if the affected folks are
willing to reproduce.


Beste Grüße / Best regards,
  - Benjamin Block
--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-12 Thread Bart Van Assche
On Wed, 2017-04-12 at 12:55 +0200, Benjamin Block wrote:
> On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote:
> > The six patches in this patch series fix the queue lockup I reported
> > recently on the linux-block mailing list. Please consider these patches
> > for inclusion in the upstream kernel.
> 
> just out of curiosity. Is this maybe related to similar stuff happening
> when CPUs are hot plugged - at least in that the stack gets stuck? Like
> in this thread here:
> https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html
> 
> Would be interesting, because we recently saw similar stuff happening.

Hello Benjamin,

My proposal is to repeat that test with Jens' for-next branch. If the issue
still occurs with that tree then please check the contents of
/sys/kernel/debug/block/*/mq/*/{dispatch,*/rq_list}. That will allow to
determine whether or not any block layer requests are still pending. If
running the command below resolves the deadlock then it means that a
trigger to run a block layer queue is still missing somewhere:

for a in /sys/kernel/debug/block/*/mq/state; do echo run >$a; done

See also git://git.kernel.dk/linux-block.git.

Bart.

Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-12 Thread Benjamin Block
On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote:
> Hello Jens,
> 
> The six patches in this patch series fix the queue lockup I reported
> recently on the linux-block mailing list. Please consider these patches
> for inclusion in the upstream kernel.
> 

Hey Bart,

just out of curiosity. Is this maybe related to similar stuff happening
when CPUs are hot plugged - at least in that the stack gets stuck? Like
in this thread here:
https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html

Would be interesting, because we recently saw similar stuff happening.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Jens Axboe
On 04/07/2017 12:39 PM, Bart Van Assche wrote:
> On Fri, 2017-04-07 at 11:33 -0700, Bart Van Assche wrote:
>> On Fri, 2017-04-07 at 12:23 -0600, Jens Axboe wrote:
>>> On 04/07/2017 12:16 PM, Bart Van Assche wrote:
 Hello Jens,

 The six patches in this patch series fix the queue lockup I reported
 recently on the linux-block mailing list. Please consider these patches
 for inclusion in the upstream kernel.
>>>
>>> Some of this we need in 4.11, but not all of it. I can't be applying patches
>>> that "improve scalability" at this point.
>>>
>>> 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then
>>> we can put 1-3 on top in 4.12, with the others pulled in first.
>>
>> Hello Jens,
>>
>> Please note that patch 2/6 is a bug fix. The current implementation of
>> blk_mq_sched_restart_queues() only considers hardware queues associated with
>> the same request queue as the hardware queue that has been passed as an
>> argument. If a tag set is shared across request queues - as is the case for
>> SCSI - then all request queues that share a tag set with the hctx argument
>> must be considered.
> 
> (replying to my own e-mail)
> 
> Hello Jens,
> 
> If you want I can split that patch into two patches - one that runs all 
> hardware
> queues with which the tag set is shared and one that switches from rerunning
> all hardware queues to one hardware queue.

I already put it in, but this is getting very awkward. We're at -rc5 time, 
patches
going into mainline should be TINY. And now I'm sitting on this, that I have to
justify:

 15 files changed, 281 insertions(+), 164 deletions(-)

and where one of the patches reads like it's a performance improvement, when
in reality it's fixing a hang. So yes, the patch should have been split in
two, and the series should have been ordered so that the first patches could
go into 4.11, and the rest on top of that in 4.12. Did we really need a
patch clarifying comments in that series? Probably not.

-- 
Jens Axboe



Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Bart Van Assche
On Fri, 2017-04-07 at 11:33 -0700, Bart Van Assche wrote:
> On Fri, 2017-04-07 at 12:23 -0600, Jens Axboe wrote:
> > On 04/07/2017 12:16 PM, Bart Van Assche wrote:
> > > Hello Jens,
> > > 
> > > The six patches in this patch series fix the queue lockup I reported
> > > recently on the linux-block mailing list. Please consider these patches
> > > for inclusion in the upstream kernel.
> > 
> > Some of this we need in 4.11, but not all of it. I can't be applying patches
> > that "improve scalability" at this point.
> > 
> > 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then
> > we can put 1-3 on top in 4.12, with the others pulled in first.
> 
> Hello Jens,
> 
> Please note that patch 2/6 is a bug fix. The current implementation of
> blk_mq_sched_restart_queues() only considers hardware queues associated with
> the same request queue as the hardware queue that has been passed as an
> argument. If a tag set is shared across request queues - as is the case for
> SCSI - then all request queues that share a tag set with the hctx argument
> must be considered.

(replying to my own e-mail)

Hello Jens,

If you want I can split that patch into two patches - one that runs all hardware
queues with which the tag set is shared and one that switches from rerunning
all hardware queues to one hardware queue.

Bart.

Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Bart Van Assche
On Fri, 2017-04-07 at 12:23 -0600, Jens Axboe wrote:
> On 04/07/2017 12:16 PM, Bart Van Assche wrote:
> > Hello Jens,
> > 
> > The six patches in this patch series fix the queue lockup I reported
> > recently on the linux-block mailing list. Please consider these patches
> > for inclusion in the upstream kernel.
> 
> Some of this we need in 4.11, but not all of it. I can't be applying patches
> that "improve scalability" at this point.
> 
> 4-6 looks like what we want for 4.11, I'll see if those apply directly. Then
> we can put 1-3 on top in 4.12, with the others pulled in first.

Hello Jens,

Please note that patch 2/6 is a bug fix. The current implementation of
blk_mq_sched_restart_queues() only considers hardware queues associated with
the same request queue as the hardware queue that has been passed as an
argument. If a tag set is shared across request queues - as is the case for
SCSI - then all request queues that share a tag set with the hctx argument
must be considered.

Bart.

Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Jens Axboe
On 04/07/2017 12:16 PM, Bart Van Assche wrote:
> Hello Jens,
> 
> The six patches in this patch series fix the queue lockup I reported
> recently on the linux-block mailing list. Please consider these patches
> for inclusion in the upstream kernel.

Some of this we need in 4.11, but not all of it. I can't be applying patches
that "improve scalability" at this point.

4-6 looks like what we want for 4.11, I'll see if those apply directly. Then
we can put 1-3 on top in 4.12, with the others pulled in first.

-- 
Jens Axboe



[PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-07 Thread Bart Van Assche
Hello Jens,

The six patches in this patch series fix the queue lockup I reported
recently on the linux-block mailing list. Please consider these patches
for inclusion in the upstream kernel.

Thanks,

Bart.

Changes between v3 and v4:
- Addressed the review comments on version three of this series about the
  patch that makes it safe to use RCU to iterate over .tag_list and also
  about the runtime performance and use of short variable names in patch 2/5.
- Clarified the description of the patch that fixes the scsi-mq stall.
- Added a patch to fix a dm-mq queue stall.
  
Changes between v2 and v3:
- Removed the blk_mq_ops.restart_hctx function pointer again.
- Modified blk_mq_sched_restart_queues() such that only a single hardware
  queue is restarted instead of multiple if hardware queues are shared.
- Introduced a new function in the block layer, namely
  blk_mq_delay_run_hw_queue().  

Changes between v1 and v2:
- Reworked scsi_restart_queues() such that it no longer takes the SCSI
  host lock.
- Added two patches - one for exporting blk_mq_sched_restart_hctx() and
  another one to make iterating with RCU over blk_mq_tag_set.tag_list safe.

Bart Van Assche (6):
  blk-mq: Make it safe to use RCU to iterate over
blk_mq_tag_set.tag_list
  blk-mq: Restart a single queue if tag sets are shared
  blk-mq: Clarify comments in blk_mq_dispatch_rq_list()
  blk-mq: Introduce blk_mq_delay_run_hw_queue()
  scsi: Avoid that SCSI queues get stuck
  dm rq: Avoid that request processing stalls sporadically

 block/blk-mq-sched.c| 63 +++---
 block/blk-mq-sched.h| 16 +--
 block/blk-mq.c  | 73 +++--
 drivers/md/dm-rq.c  |  1 +
 drivers/scsi/scsi_lib.c |  6 ++--
 include/linux/blk-mq.h  |  2 ++
 include/linux/blkdev.h  |  1 -
 7 files changed, 118 insertions(+), 44 deletions(-)

-- 
2.12.0