[PATCH V4 7/7] nvme: pci: support nested EH

2018-05-05 Thread Ming Lei
de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/core.c | 26 drivers/nvme/host/nvme.h | 2 + drivers/nvme/host/pci.c | 161 +

[PATCH V4 6/7] nvme: pci: prepare for supporting error recovery from resetting context

2018-05-05 Thread Ming Lei
. Cc: Jianchao Wang <jianchao.w.w...@oracle.com> Cc: Christoph Hellwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> ---

[PATCH V4 4/7] nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery

2018-05-05 Thread Ming Lei
Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 47 --- 1 file changed, 32 insertions(+), 15 deletions(

[PATCH V4 5/7] nvme: core: introduce 'reset_lock' for sync reset state and reset activities

2018-05-05 Thread Ming Lei
gt; Cc: Christoph Hellwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/core.c | 20 +--- drivers/nvme

[PATCH V4 2/7] nvme: pci: cover timeout for admin commands running in EH

2018-05-05 Thread Ming Lei
..@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 81 ++--- 1 file changed,

[PATCH V4 3/7] nvme: pci: only wait freezing if queue is frozen

2018-05-05 Thread Ming Lei
grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme

[PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-05 Thread Ming Lei
, and avoid to break nvme_reset_ctrl*() V2: - fix draining timeout work, so no need to change return value from .timeout() - fix race between nvme_start_freeze() and nvme_unfreeze() - cover timeout for admin commands running in EH Ming Lei (7): block: introduce

[PATCH V4 1/7] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()

2018-05-05 Thread Ming Lei
llwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- block/blk-core.c | 21 +++-- block/blk-mq.c | 9 +

Re: [PATCH V3 7/8] nvme: pci: recover controller reliably

2018-05-04 Thread Ming Lei
On Fri, May 4, 2018 at 4:28 PM, jianchao.wang <jianchao.w.w...@oracle.com> wrote: > Hi ming > > On 05/04/2018 04:02 PM, Ming Lei wrote: >>> nvme_error_handler should invoke nvme_reset_ctrl instead of introducing >>> another interface. >>

Re: [PATCH V3 7/8] nvme: pci: recover controller reliably

2018-05-04 Thread Ming Lei
On Fri, May 04, 2018 at 04:28:23PM +0800, jianchao.wang wrote: > Hi ming > > On 05/04/2018 04:02 PM, Ming Lei wrote: > >> nvme_error_handler should invoke nvme_reset_ctrl instead of introducing > >> another interface. > >> Then it is more convenient t

Re: [PATCH V3 7/8] nvme: pci: recover controller reliably

2018-05-04 Thread Ming Lei
On Fri, May 04, 2018 at 02:10:19PM +0800, jianchao.wang wrote: > Hi ming > > On 05/04/2018 12:24 PM, Ming Lei wrote: > >> Just invoke nvme_dev_disable in nvme_error_handler context and hand over > >> the other things > >> to nvme_reset_work as the v2 patch se

Re: [PATCH V3 7/8] nvme: pci: recover controller reliably

2018-05-03 Thread Ming Lei
On Thu, May 03, 2018 at 11:46:56PM +0800, jianchao.wang wrote: > Hi Ming > > Thanks for your kindly response. > > On 05/03/2018 06:08 PM, Ming Lei wrote: > > nvme_eh_reset() can move on, if controller state is either CONNECTING or > > RESETTING, nvme_change_ctrl_state

Re: [PATCH V3 7/8] nvme: pci: recover controller reliably

2018-05-03 Thread Ming Lei
On Thu, May 03, 2018 at 05:14:30PM +0800, jianchao.wang wrote: > Hi ming > > On 05/03/2018 11:17 AM, Ming Lei wrote: > > static int io_queue_depth_set(const char *val, const struct kernel_param > > *kp) > > @@ -1199,7 +1204,7 @@ static enum blk_eh_timer_return nvme_

[PATCH V3 8/8] nvme: pci: simplify timeout handling

2018-05-02 Thread Ming Lei
is much simpler than current concurrent timeout model. Cc: Jianchao Wang <jianchao.w.w...@oracle.com> Cc: Christoph Hellwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by

[PATCH V3 7/8] nvme: pci: recover controller reliably

2018-05-02 Thread Ming Lei
, so that timeout even won't quiesce queue any more during draining IO Cc: Jianchao Wang <jianchao.w.w...@oracle.com> Cc: Christoph Hellwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> S

[PATCH V3 6/8] nvme: pci: split controller resetting into two parts

2018-05-02 Thread Ming Lei
edhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 37 +++-- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index ef80e064a62c..16d7507bfd79 100644 --- a

[PATCH V3 3/8] nvme: pci: only wait freezing if queue is frozen

2018-05-02 Thread Ming Lei
grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme

[PATCH V3 4/8] nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery

2018-05-02 Thread Ming Lei
Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 47 --- 1 file changed, 32 insertions(+), 15 deletions(

[PATCH V3 5/8] nvme: fix race between freeze queues and unfreeze queues

2018-05-02 Thread Ming Lei
..@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/core.c | 29 + drivers/nvme/host/nvme.h | 3 +++ 2 file

[PATCH V3 2/8] nvme: pci: cover timeout for admin commands running in EH

2018-05-02 Thread Ming Lei
..@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 77 ++--- 1 file changed,

[PATCH V3 0/8] nvme: pci: fix & improve timeout handling

2018-05-02 Thread Ming Lei
nvme_start_freeze() and nvme_unfreeze() - cover timeout for admin commands running in EH Ming Lei (8): block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout() nvme: pci: cover timeout for admin commands running in EH nvme: pci: only wait freezing if queue is frozen nvme: pci: freeze

[PATCH V3 1/8] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()

2018-05-02 Thread Ming Lei
llwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- block/blk-core.c | 21 +++-- block/blk-mq.c | 9 +

Re: Performance drop due to "blk-mq-sched: improve sequential I/O performance"

2018-05-02 Thread Ming Lei
On Wed, May 02, 2018 at 03:32:53PM +0530, Kashyap Desai wrote: > > -Original Message- > > From: Ming Lei [mailto:ming@redhat.com] > > Sent: Wednesday, May 2, 2018 3:17 PM > > To: Kashyap Desai > > Cc: linux-s...@vger.kernel.org; linux-block@vger.kernel.or

Re: Performance drop due to "blk-mq-sched: improve sequential I/O performance"

2018-05-02 Thread Ming Lei
On Wed, May 02, 2018 at 01:13:34PM +0530, Kashyap Desai wrote: > Hi Ming, > > I was running some performance test on latest 4.17-rc and figure out > performance drop (approximate 15% drop) due to below patch set. > https://marc.info/?l=linux-block=150802309522847=2 > > I observed drop on latest

Re: [PATCH V2 5/5] nvme: pci: simplify timeout handling

2018-05-02 Thread Ming Lei
On Wed, May 02, 2018 at 01:12:57PM +0800, jianchao.wang wrote: > Hi Ming > > On 05/02/2018 12:54 PM, Ming Lei wrote: > >> We need to return BLK_EH_RESET_TIMER in nvme_timeout then: > >> 1. defer the completion. we can't unmap the io request before close the

Re: [PATCH V2 5/5] nvme: pci: simplify timeout handling

2018-05-01 Thread Ming Lei
On Wed, May 02, 2018 at 10:23:20AM +0800, jianchao.wang wrote: > Hi Ming > > On 04/29/2018 11:41 PM, Ming Lei wrote: > > + > > static enum blk_eh_timer_return nvme_timeout(struct request *req, bool > > reserved) > > { > > struct nvme_iod *iod = blk_mq

Re: [PATCH V2 1/5] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()

2018-05-01 Thread Ming Lei
On Wed, May 02, 2018 at 10:23:35AM +0800, jianchao.wang wrote: > Hi ming > > On 04/29/2018 11:41 PM, Ming Lei wrote: > > > > +static void __blk_unquiesce_timeout(struct request_queue *q) > > +{ > > + unsigned long flags; > > + > > + spin_

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-30 Thread Ming Lei
On Mon, Apr 30, 2018 at 01:52:17PM -0600, Keith Busch wrote: > On Sun, Apr 29, 2018 at 05:39:52AM +0800, Ming Lei wrote: > > On Sat, Apr 28, 2018 at 9:35 PM, Keith Busch > > <keith.bu...@linux.intel.com> wrote: > > > On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Le

[PATCH V2 1/5] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()

2018-04-29 Thread Ming Lei
llwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Signed-off-by: Ming Lei <ming@redhat.com> --- block/blk-core.c | 34 -- block/blk-mq.c | 9 + block/blk-timeout.c| 5 -

[PATCH V2 3/5] nvme: pci: only wait freezing if queue is frozen

2018-04-29 Thread Ming Lei
grimberg.me> Cc: linux-n...@lists.infradead.org Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 0e6cd605164a..8172ee584130 100644

[PATCH V2 5/5] nvme: pci: simplify timeout handling

2018-04-29 Thread Ming Lei
rg.me> Cc: linux-n...@lists.infradead.org Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/core.c | 22 drivers/nvme/host/nvme.h | 3 + drivers/nvme/host/pci.c | 140 +++ 3 files changed, 155 insertions(+), 10 deletions(-) dif

[PATCH V2 4/5] nvme: fix race between freeze queues and unfreeze queues

2018-04-29 Thread Ming Lei
..@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/core.c | 29 + drivers/nvme/host/nvme.h | 3 +++ 2 files changed, 24 insertions(+), 8 deletions(-) diff -

[PATCH V2 2/5] nvme: pci: cover timeout for admin commands running in EH

2018-04-29 Thread Ming Lei
..@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 77 ++--- 1 file changed, 66 insertions(+), 11 deletions(-) diff --git a/drivers/n

[PATCH V2 0/5] nvme: pci: fix & improve timeout handling

2018-04-29 Thread Ming Lei
, and finally can make blktests block/011 passed. Ming Lei (5): block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout() nvme: pci: cover timeout for admin commands running in EH nvme: pci: only wait freezing if queue is frozen nvme: fix race between freeze queues and unfreeze

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-29 Thread Ming Lei
On Sun, Apr 29, 2018 at 10:21 AM, jianchao.wang <jianchao.w.w...@oracle.com> wrote: > Hi ming > > On 04/29/2018 09:36 AM, Ming Lei wrote: >> On Sun, Apr 29, 2018 at 6:27 AM, Ming Lei <tom.leim...@gmail.com> wrote: >>> On Sun, Apr 29, 2018 at 5:57 AM, Mi

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-28 Thread Ming Lei
On Sun, Apr 29, 2018 at 6:27 AM, Ming Lei <tom.leim...@gmail.com> wrote: > On Sun, Apr 29, 2018 at 5:57 AM, Ming Lei <tom.leim...@gmail.com> wrote: >> On Sat, Apr 28, 2018 at 10:00 PM, jianchao.wang >> <jianchao.w.w...@oracle.com> wrote: >>> Hi ming >

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-28 Thread Ming Lei
On Sat, Apr 28, 2018 at 10:00 PM, jianchao.wang <jianchao.w.w...@oracle.com> wrote: > Hi ming > > On 04/27/2018 10:57 PM, Ming Lei wrote: >> I may not understand your point, once blk_sync_queue() returns, the >> timer itself is deactivated, meantime the synced .nvm

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-28 Thread Ming Lei
On Sat, Apr 28, 2018 at 9:35 PM, Keith Busch <keith.bu...@linux.intel.com> wrote: > On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote: >> > I understand how the problems are happening a bit better now. It used >> > to be that blk-mq would lock an expired com

Re: [PATCH 0/3] scsi: scsi-mq: don't hold host_busy in IO path

2018-04-28 Thread Ming Lei
On Fri, Apr 27, 2018 at 09:39:47AM -0600, Jens Axboe wrote: > On 4/27/18 9:31 AM, Bart Van Assche wrote: > > On Fri, 2018-04-20 at 14:57 +0800, Ming Lei wrote: > >> This patches removes the expensive atomic opeation on host-wide counter > >> of .host_busy for scsi-mq, a

Re: [PATCH 3/3] scsi: avoid to hold host-wide counter of host_busy for scsi_mq

2018-04-28 Thread Ming Lei
On Fri, Apr 27, 2018 at 04:16:48PM +, Bart Van Assche wrote: > On Fri, 2018-04-20 at 14:57 +0800, Ming Lei wrote: > > +struct scsi_host_mq_in_flight { > > + int cnt; > > +}; > > + > > +static void scsi_host_check_in_flight(struct request *rq, void *da

Re: [PATCH 2/3] scsi: read host_busy via scsi_host_busy()

2018-04-28 Thread Ming Lei
On Fri, Apr 27, 2018 at 03:51:46PM +, Bart Van Assche wrote: > On Fri, 2018-04-20 at 14:57 +0800, Ming Lei wrote: > > show_host_busy(struct device *dev, struct device_attribute *attr, char > > *buf) > > { > > struct Scsi_Host *shost = class_to_shost(dev); >

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-27 Thread Ming Lei
On Fri, Apr 27, 2018 at 11:51:57AM -0600, Keith Busch wrote: > On Thu, Apr 26, 2018 at 08:39:55PM +0800, Ming Lei wrote: > > +/* > > + * This one is called after queues are quiesced, and no in-fligh timeout > > + * and nvme interrupt handling. > > + */ > > +st

Re: [PATCH 2/2] nvme: pci: guarantee EH can make progress

2018-04-27 Thread Ming Lei
On Thu, Apr 26, 2018 at 10:24:43AM -0600, Keith Busch wrote: > On Thu, Apr 26, 2018 at 08:39:56PM +0800, Ming Lei wrote: > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > > index 5d05a04f8e72..1e058deb4718 100644 > > --- a/drivers/nvme/host/pci.c > &

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-27 Thread Ming Lei
On Fri, Apr 27, 2018 at 09:37:06AM +0800, jianchao.wang wrote: > > > On 04/26/2018 11:57 PM, Ming Lei wrote: > > Hi Jianchao, > > > > On Thu, Apr 26, 2018 at 11:07:56PM +0800, jianchao.wang wrote: > >> Hi Ming > >> > >> Thanks for your w

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-26 Thread Ming Lei
On Thu, Apr 26, 2018 at 11:57:22PM +0800, Ming Lei wrote: > Hi Jianchao, > > On Thu, Apr 26, 2018 at 11:07:56PM +0800, jianchao.wang wrote: > > Hi Ming > > > > Thanks for your wonderful solution. :) > > > > On 04/26/2018 08:39 PM, Ming Lei wrote: >

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-26 Thread Ming Lei
Hi Jianchao, On Thu, Apr 26, 2018 at 11:07:56PM +0800, jianchao.wang wrote: > Hi Ming > > Thanks for your wonderful solution. :) > > On 04/26/2018 08:39 PM, Ming Lei wrote: > > +/* > > + * This one is called after queues are quiesced, and no in-fligh timeout > >

[PATCH 2/2] nvme: pci: guarantee EH can make progress

2018-04-26 Thread Ming Lei
me> Cc: linux-n...@lists.infradead.org Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/pci.c | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 5d05a04f8e72..1e058deb4718 100644 --- a

[PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-26 Thread Ming Lei
rom the horible test of block/011. Cc: Jianchao Wang <jianchao.w.w...@oracle.com> Cc: Christoph Hellwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Cc: linux-n...@lists.infradead.org Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/nvme/host/core.c | 11 +++

[PATCH 0/2] nvme: pci: fix & improve timeout handling

2018-04-26 Thread Ming Lei
Hi, This first patch introduces EH kthread for handling timeout, and simplifies the logics a lot, and fixes reports on block/011. The 2nd one fixes the issue reported by Jianchao, in which admin req may time out in EH. Ming Lei (2): nvme: pci: simplify timeout handling nvme: pci: guarantee

[PATCH] Revert "blk-mq: remove code for dealing with remapping queue"

2018-04-24 Thread Ming Lei
: Ewan Milne <emi...@redhat.com> Cc: Stefan Haberland <s...@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntrae...@de.ibm.com> Cc: Christoph Hellwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Signed-off-by: Ming Lei <ming..

[PATCH 3/3] scsi: avoid to hold host-wide counter of host_busy for scsi_mq

2018-04-20 Thread Ming Lei
ai <kashyap.de...@broadcom.com> Cc: Mike Snitzer <snit...@redhat.com> Cc: Hannes Reinecke <h...@suse.de> Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <ming@redhat.com> --- drivers/scsi/hosts.c| 24 +++- drivers/scsi/scsi_li

[PATCH 1/3] scsi: introduce scsi_host_busy()

2018-04-20 Thread Ming Lei
tnership.com>, Cc: Christoph Hellwig <h...@lst.de>, Cc: Don Brace <don.br...@microsemi.com> Cc: Kashyap Desai <kashyap.de...@broadcom.com> Cc: Mike Snitzer <snit...@redhat.com> Cc: Hannes Reinecke <h...@suse.de> Cc: Laurence Oberman <lober...@redhat.com> Sign

[PATCH 2/3] scsi: read host_busy via scsi_host_busy()

2018-04-20 Thread Ming Lei
Hellwig <h...@lst.de>, Cc: Don Brace <don.br...@microsemi.com> Cc: Kashyap Desai <kashyap.de...@broadcom.com> Cc: Mike Snitzer <snit...@redhat.com> Cc: Hannes Reinecke <h...@suse.de> Cc: Laurence Oberman <lober...@redhat.com> Signed-off-by: Ming Lei <m

[PATCH 0/3] scsi: scsi-mq: don't hold host_busy in IO path

2018-04-20 Thread Ming Lei
Hi, This patches removes the expensive atomic opeation on host-wide counter of .host_busy for scsi-mq, and it is observed that IOPS can be increased by 15% with this change in IO test over scsi_debug. Ming Lei (3): scsi: introduce scsi_host_busy() scsi: read host_busy via scsi_host_busy

Re: [PATCH V4 0/2] blk-mq: fix race between completion and BLK_EH_RESET_TIMER

2018-04-18 Thread Ming Lei
On Mon, Apr 16, 2018 at 03:12:30PM +0200, Martin Steigerwald wrote: > Ming Lei - 16.04.18, 02:45: > > On Sun, Apr 15, 2018 at 06:31:44PM +0200, Martin Steigerwald wrote: > > > Hi Ming. > > > > > > Ming Lei - 15.04.18, 17:43: > > > > Hi Jens, >

Re: [PATCH] blk-mq: start request gstate with gen 1

2018-04-16 Thread Ming Lei
fore the > scsi_cmnd is not initialized, scsi_cmnd->device is still NULL at > the moment, then we will get crash. > > Cc: Bart Van Assche <bart.vanass...@wdc.com> > Cc: Tejun Heo <t...@kernel.org> > Cc: Ming Lei <ming@redhat.com> > Cc: Martin Steigerwald

[PATCH] target: fix crash with iscsi target and dvd

2018-04-16 Thread Ming Lei
.@wdc.com> Cc: target-de...@vger.kernel.org Cc: linux-s...@vger.kernel.org Cc: "Nicholas A. Bellinger" <n...@linux-iscsi.org> Cc: Christoph Hellwig <h...@lst.de> Fixes: 84c8590646d5b35804 ("target: avoid accessing .bi_vcnt directly") Signed-off-by: Ming L

Re: [PATCH] blk-mq: mark hctx RESTART when get budget fails

2018-04-16 Thread Ming Lei
On Mon, Apr 16, 2018 at 03:55:36PM +0800, Jianchao Wang wrote: > When get budget fails, blk_mq_sched_dispatch_requests does not do > anything to ensure the hctx to be restarted. We can survive from > this, because only the scsi implements .get_budget and it always > runs the hctx queues when

Re: [PATCH V4 0/2] blk-mq: fix race between completion and BLK_EH_RESET_TIMER

2018-04-15 Thread Ming Lei
On Sun, Apr 15, 2018 at 06:31:44PM +0200, Martin Steigerwald wrote: > Hi Ming. > > Ming Lei - 15.04.18, 17:43: > > Hi Jens, > > > > This two patches fixes the recently discussed race between completion > > and BLK_EH_RESET_TIMER. > > > > Israel &

[PATCH V4 1/2] blk-mq: set RQF_MQ_TIMEOUT_EXPIRED when the rq's timeout isn't handled

2018-04-15 Thread Ming Lei
m> Cc: Tejun Heo <t...@kernel.org> Cc: Christoph Hellwig <h...@lst.de> Cc: Ming Lei <ming@redhat.com> Cc: Sagi Grimberg <s...@grimberg.me> Cc: Israel Rukshin <isra...@mellanox.com>, Cc: Max Gurtovoy <m...@mellanox.com> Cc: Martin Steigerwald <mar...@lichtvoll.de&

[PATCH V4 2/2] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-15 Thread Ming Lei
;jianchao.w.w...@oracle.com> Cc: Bart Van Assche <bart.vanass...@wdc.com> Cc: Tejun Heo <t...@kernel.org> Cc: Christoph Hellwig <h...@lst.de> Cc: Ming Lei <ming@redhat.com> Cc: Sagi Grimberg <s...@grimberg.me> Cc: Israel Rukshin <isra...@mellanox.com>, Cc: Max Gur

[PATCH V4 0/2] blk-mq: fix race between completion and BLK_EH_RESET_TIMER

2018-04-15 Thread Ming Lei
imer() and blk_mq_rq_update_aborted_gstate(req, 0) Ming Lei (2): blk-mq: set RQF_MQ_TIMEOUT_EXPIRED when the rq's timeout isn't handled blk-mq: fix race between complete and BLK_EH_RESET_TIMER block/blk-mq.c | 120 +++-- block/blk-mq.h | 1 + block

Re: [PATCH V3] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-15 Thread Ming Lei
On Sat, Apr 14, 2018 at 03:22:07PM +, Bart Van Assche wrote: > On Fri, 2018-04-13 at 21:06 -0600, Jens Axboe wrote: > > I like this approach since it keeps the cost outside of the fast > > path. And it's fine to reuse the queue lock for this, instead of > > adding a special lock for something

Re: [PATCH 2/2] loop: handle short DIO reads

2018-04-14 Thread Ming Lei
struct bio *bio = rq->bio; > + > + while (bio) { > + zero_fill_bio(bio); > + bio = bio->bi_next; > + } > + } > + ret = BLK_STS_IOERR; > +end_io: > + blk_mq_end_request(rq, ret); > + } > } > > static void lo_rw_aio_do_completion(struct loop_cmd *cmd) > -- > 2.7.4 > Looks fine, short read will be guaranteed to complete when zero read is triggered. Reviewed-by: Ming Lei <ming@redhat.com> -- Ming

Re: [PATCH 1/2] loop: remove cmd->rq member

2018-04-14 Thread Ming Lei
_tag_set > *set, struct request *rq, > { > struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq); > > - cmd->rq = rq; > kthread_init_work(>work, loop_queue_work); > - > return 0; > } > > diff --git a/drivers/block/loop.h b/drivers/block/loop.h > index 0f45416e4fcf..b78de9879f4f 100644 > --- a/drivers/block/loop.h > +++ b/drivers/block/loop.h > @@ -66,7 +66,6 @@ struct loop_device { > > struct loop_cmd { > struct kthread_work work; > - struct request *rq; > bool use_aio; /* use AIO interface to handle I/O */ > atomic_t ref; /* only for aio */ > long ret; > -- > 2.7.4 > Reviewed-by: Ming Lei <ming@redhat.com> -- Ming

Re: [PATCH v5] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-13 Thread Ming Lei
y nr_expired. > - Remove the code that became superfluous due to this change, namely > the RCU lock and unlock statements in blk_mq_complete_request() and > also the synchronize_rcu() call in the timeout handler. > > Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com

Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-13 Thread Ming Lei
On Thu, Apr 12, 2018 at 06:57:12AM -0700, Tejun Heo wrote: > On Thu, Apr 12, 2018 at 07:05:13AM +0800, Ming Lei wrote: > > > Not really because aborted_gstate right now doesn't have any memory > > > barrier around it, so nothing ensures blk_add_timer() actually appears > &g

Re: 4.15.14 crash with iscsi target and dvd

2018-04-12 Thread Ming Lei
On Thu, Apr 12, 2018 at 09:43:02PM -0400, Wakko Warner wrote: > Ming Lei wrote: > > On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote: > > > Sorry for the delay. I reverted my change, added this one. I didn't > > > reboot, I just unloaded and loaded thi

[PATCH V3] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-12 Thread Ming Lei
m> Cc: Tejun Heo <t...@kernel.org> Cc: Christoph Hellwig <h...@lst.de> Cc: Ming Lei <ming@redhat.com> Cc: Sagi Grimberg <s...@grimberg.me> Cc: Israel Rukshin <isra...@mellanox.com>, Cc: Max Gurtovoy <m...@mellanox.com> Cc: sta...@vger.kernel.org Signed-off-b

Re: 4.15.14 crash with iscsi target and dvd

2018-04-12 Thread Ming Lei
On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote: > Ming Lei wrote: > > Sure, thanks for your sharing. > > > > Wakko, could you test the following patch and see if there is any > > difference? > > > > -- > > diff --git a/drivers/tar

Re: [PATCH V2] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-12 Thread Ming Lei
On Thu, Apr 12, 2018 at 10:38:56AM +0800, jianchao.wang wrote: > Hi Ming > > On 04/12/2018 07:38 AM, Ming Lei wrote: > > +* > > +* Cover complete vs BLK_EH_RESET_TIMER race in slow path with > > +* helding queue lock. > >

Re: [PATCH V2] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-12 Thread Ming Lei
Hi Jianchao, On Thu, Apr 12, 2018 at 10:38:56AM +0800, jianchao.wang wrote: > Hi Ming > > On 04/12/2018 07:38 AM, Ming Lei wrote: > > +* > > +* Cover complete vs BLK_EH_RESET_TIMER race in slow path with > > +* helding queue lock. > >

[PATCH V2] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-11 Thread Ming Lei
. Cc: Bart Van Assche <bart.vanass...@wdc.com> Cc: Tejun Heo <t...@kernel.org> Cc: Christoph Hellwig <h...@lst.de> Cc: Ming Lei <ming@redhat.com> Cc: Sagi Grimberg <s...@grimberg.me> Cc: Israel Rukshin <isra...@mellanox.com>, Cc: Max Gurtovoy <m...@mellanox

Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-11 Thread Ming Lei
On Wed, Apr 11, 2018 at 10:49:51PM +, Bart Van Assche wrote: > On Thu, 2018-04-12 at 04:55 +0800, Ming Lei wrote: > > +again: > > switch (ret) { > > case BLK_EH_HANDLED: > > __blk_mq_complete_request(req); > > break

Re: [PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-11 Thread Ming Lei
On Wed, Apr 11, 2018 at 02:30:07PM -0700, Tejun Heo wrote: > Hello, Ming. > > On Thu, Apr 12, 2018 at 04:55:29AM +0800, Ming Lei wrote: > ... > > + spin_lock_irqsave(req->q->queue_lock, flags); > > + if (blk_mq_rq_state(r

[PATCH] blk-mq: fix race between complete and BLK_EH_RESET_TIMER

2018-04-11 Thread Ming Lei
. Cc: Bart Van Assche <bart.vanass...@wdc.com> Cc: Tejun Heo <t...@kernel.org> Cc: Christoph Hellwig <h...@lst.de> Cc: Ming Lei <ming@redhat.com> Cc: Sagi Grimberg <s...@grimberg.me> Cc: Israel Rukshin <isra...@mellanox.com>, Cc: Max Gurtovoy <m...@mellanox

[PATCH] blk-mq: Revert "blk-mq: reimplement blk_mq_hw_queue_mapped"

2018-04-11 Thread Ming Lei
<s...@grimberg.me> Reported-by: Jens Axboe <ax...@kernel.dk> Fixes: 127276c6ce5a30fcc ("blk-mq: reimplement blk_mq_hw_queue_mapped") Signed-off-by: Ming Lei <ming@redhat.com> --- block/blk-mq.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bloc

Re: Hang with blk-mq map series (block/008)

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 08:08:43PM -0600, Jens Axboe wrote: > On 4/10/18 8:02 PM, Ming Lei wrote: > > On Tue, Apr 10, 2018 at 09:51:41AM -0600, Jens Axboe wrote: > >> Hi Ming, > >> > >> Ran the above blktests on the current tree, and we end up getting &g

Re: [PATCH v2] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash

2018-04-10 Thread Ming Lei
() > call with blk_queue_enter() / blk_queue_exit(). > > Reported-by: Ming Lei <ming@redhat.com> > Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the > block cgroup controller") > Signed-off-by: Bart Van Assche <bart.vanass...@wdc.

Re: [PATCH v5] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 03:01:57PM -0600, Bart Van Assche wrote: > The blk-mq timeout handling code ignores completions that occur after > blk_mq_check_expired() has been called and before blk_mq_rq_timed_out() > has reset rq->aborted_gstate. If a block driver timeout handler always > returns

Re: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash

2018-04-10 Thread Ming Lei
() > call with blk_queue_enter() / blk_queue_exit(). > > Reported-by: Ming Lei <ming@redhat.com> > Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the > block cgroup controller") > Signed-off-by: Bart Van Assche <bart.vanass...@wdc.

Re: Hang with blk-mq map series (block/008)

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 09:51:41AM -0600, Jens Axboe wrote: > Hi Ming, > > Ran the above blktests on the current tree, and we end up getting > a hang that we never recover from. There's on request perpetually > stuck: > > root@dell:/sys/kernel/debug/block/nvme0n1/hctx2# cat busy >

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
Hi Tejun, On Tue, Apr 10, 2018 at 08:30:31AM -0700, t...@kernel.org wrote: > Hello, Ming. > > On Tue, Apr 10, 2018 at 11:25:54PM +0800, Ming Lei wrote: > > + if (time_after_eq(jiffies, deadline) && > > + blk_mq_change_rq_state(rq, MQ

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 03:02:11PM +, Bart Van Assche wrote: > On Tue, 2018-04-10 at 22:30 +0800, Ming Lei wrote: > > On Tue, Apr 10, 2018 at 02:09:33PM +, Bart Van Assche wrote: > > > Please keep in mind that all synchronize_rcu() does is to wait for pre- > >

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 02:09:33PM +, Bart Van Assche wrote: > On Tue, 2018-04-10 at 21:55 +0800, Ming Lei wrote: > > Then I have same question with Jianchao, what is the actual double > > complete in linus tree between BLK_EH_RESET_TIMER and normal completion? >

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 12:58:04PM +, Bart Van Assche wrote: > On Tue, 2018-04-10 at 16:41 +0800, Ming Lei wrote: > > On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > > > If a completion occurs after blk_mq_rq_timed_out() has reset > > > rq->

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Tue, Apr 10, 2018 at 03:59:30PM +0800, jianchao.wang wrote: > Hi Bart > > On 04/10/2018 09:34 AM, Bart Van Assche wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > expires then a request will

Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

2018-04-10 Thread Ming Lei
On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote: > If a completion occurs after blk_mq_rq_timed_out() has reset > rq->aborted_gstate and the request is again in flight when the timeout Given rq->aborted_gstate is reset only for BLK_EH_RESET_TIMER, I think you are addressing the

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 10:54:57PM +, Bart Van Assche wrote: > On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote: > > The oops happens during generic_make_request_checks(), in > > blk_throtl_bio() exactly. > > So if we want to bypass dying queue, we have to check this before > >

Re: 4.15.14 crash with iscsi target and dvd

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 07:43:01PM -0400, Wakko Warner wrote: > Ming Lei wrote: > > On Mon, Apr 09, 2018 at 09:30:11PM +, Bart Van Assche wrote: > > > Hello Ming, > > > > > > Can you have a look at this? The start of this e-mail thread is available > &g

Re: 4.15.14 crash with iscsi target and dvd

2018-04-09 Thread Ming Lei
ac60eb58b145839b5893e > > Author: Ming Lei <tom.leim...@gmail.com> > > Date: Fri Nov 11 20:05:32 2016 +0800 > > > > target: avoid accessing .bi_vcnt directly > > > > When the bio is full, bio_add_pc_page() will return zero, > > so

Re: [s390x] New regression was found on kernel-4.16

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 06:18:04PM +0800, Li Wang wrote: > Hi, > > I got this BUG_ON() on s390x platform with kernel-v4.16.0. > > [1.200196] [ cut here ] > [1.200201] kernel BUG at block/bio.c:1798! > [1.200228] illegal operation: 0001 ilc:1 [#1] SMP > [

Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]

2018-04-09 Thread Ming Lei
; >>>> On 4/9/18 12:38 PM, Mike Snitzer wrote: > >>>>> On Mon, Apr 09 2018 at 11:51am -0400, > >>>>> Mike Snitzer <snit...@redhat.com> wrote: > >>>>> > >>>>>> On Sun, Apr 08 2018 at 12:00am -0400, > >&

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-09 Thread Ming Lei
On Mon, Apr 09, 2018 at 11:31:37AM +0300, Sagi Grimberg wrote: > > > > My device exposes nr_hw_queues which is not higher than num_online_cpus > > > so I want to connect all hctxs with hope that they will be used. > > > > The issue is that CPU online & offline can happen any time, and after > >

Re: [block regression] kernel oops triggered by removing scsi device dring IO

2018-04-08 Thread Ming Lei
On Mon, Apr 09, 2018 at 09:33:08AM +0800, Joseph Qi wrote: > Hi Bart, > > On 18/4/8 22:50, Bart Van Assche wrote: > > On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote: > >> The following kernel oops is triggered by 'removing scsi device' during > >> heavy

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 04:35:59PM +0300, Sagi Grimberg wrote: > > > On 04/08/2018 03:57 PM, Ming Lei wrote: > > On Sun, Apr 08, 2018 at 02:53:03PM +0300, Sagi Grimberg wrote: > > > > > > > > > > > > Hi Sagi > > > > > > &g

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 02:53:03PM +0300, Sagi Grimberg wrote: > > > > > > > > Hi Sagi > > > > > > > > > > > > > > Still can reproduce this issue with the change: > > > > > > > > > > > > Thanks for validating Yi, > > > > > > > > > > > > Would it be possible to test the following: > > > > > >

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 01:58:49PM +0300, Sagi Grimberg wrote: > > > > > > Hi Sagi > > > > > > > > > > Still can reproduce this issue with the change: > > > > > > > > Thanks for validating Yi, > > > > > > > > Would it be possible to test the following: > > > > -- > > > > diff --git

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 06:44:33PM +0800, Ming Lei wrote: > On Sun, Apr 08, 2018 at 01:36:27PM +0300, Sagi Grimberg wrote: > > > > > Hi Sagi > > > > > > Still can reproduce this issue with the change: > > > > Thanks for validating Yi, >

Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-08 Thread Ming Lei
On Sun, Apr 08, 2018 at 01:36:27PM +0300, Sagi Grimberg wrote: > > > Hi Sagi > > > > Still can reproduce this issue with the change: > > Thanks for validating Yi, > > Would it be possible to test the following: > -- > diff --git a/block/blk-mq.c b/block/blk-mq.c > index

<    1   2   3   4   5   6   7   8   9   10   >