Reviewed-by: Sagi Grimberg
Yes, I'm very much in favour of this, too.
We always have this IMO slightly weird notion of stopping the queue, set
some error flags in the driver, then _restarting_ the queue, just so
that the driver then sees the error flag and terminates the requests.
Which I always found quite
Yes, I'm very much in favour of this, too.
We always have this IMO slightly weird notion of stopping the queue,
set
some error flags in the driver, then _restarting_ the queue, just so
that the driver then sees the error flag and terminates the requests.
Which I always found quite
If it really becomes an issue we
should rework the nvme code to also skip the multipath code for any
private namespace, even if that could mean some trouble when rescanning.
This requires some explanation? skip the multipath code how?
We currently always go through the multipath node as
+static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
Do we still need to carry the tag around?
Yes, the timeout handler polls for a specific tag.
Does it have to? the documentation suggests that we missed
an interrupt, so it is probably waiting on the completion
@@ -2173,6 +2157,8 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (nr_io_queues == 0)
return 0;
+
+ clear_bit(NVMEQ_ENABLED, >flags);
This is a change of behavior, looks correct though as we can fail
nvme_setup_irqs after we freed the
Nit, there seems to be an extra newline that can be omitted here before
the else if statement (if I'm reading this correctly)...
Empty lines can always be ommited, but in this case I actually like it
as it seems to help readability..
If you think its useful I'm fine with it as is...
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
A driver may wish to iterate every tagged request, not just ones that
satisfy blk_mq_request_started(). The intended use is so a driver may
terminate entered requests on quiesced queues.
How about we just move the started check into the handler passed in for
those that care about it? Much
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
If it really becomes an issue we
should rework the nvme code to also skip the multipath code for any
private namespace, even if that could mean some trouble when rescanning.
This requires some explanation? skip the multipath code how?
Other than that,
Reviewed-by: Sagi Grimberg
of disabling interrupts.
With that we can stop taking the cq_lock for normal queues.
Nice,
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
@@ -2428,7 +2426,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool
shutdown)
nvme_stop_queues(>ctrl);
if (!dead && dev->ctrl.queue_count > 0) {
- nvme_disable_io_queues(dev);
+ if (nvme_disable_io_queues(dev, nvme_admin_delete_sq))
+
+static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
Do we still need to carry the tag around?
Other than that,
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
@@ -2173,6 +2157,8 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (nr_io_queues == 0)
return 0;
+
+ clear_bit(NVMEQ_ENABLED, >flags);
This is a change of behavior, looks correct though as we can fail
nvme_setup_irqs after we freed the admin
AD))
+ type = HCTX_TYPE_READ;
Nit, there seems to be an extra newline that can be omitted here before
the else if statement (if I'm reading this correctly)...
Otherwise looks good,
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
We currently only really support sync poll, ie poll with 1
IO in flight. This prepares us for supporting async poll.
Hey Jens,
So are we sure that this is fine to simply replace the
poll functionality? you say that that we support poll
with only 1 I/O inflight but is it entirely true?
We currently only really support sync poll, ie poll with 1
IO in flight. This prepares us for supporting async poll.
Hey Jens,
So are we sure that this is fine to simply replace the
poll functionality? you say that that we support poll
with only 1 I/O inflight but is it entirely true?
This opportunistic poll is pretty bogus now as we never set the HIPRI
flag and it should probably be removed in a prep patch. We should then
later try to use a scheme similar to your aio polling for the nvme
target as well.
I'll kill it in a pre-patch.
Agreed..
That is because your for-linus was still 4.20-rc based when
I created it.
It's never been 4.20-rc based. After the initial merge, it got based
on a random commit in the merge window:
commit 3acbd2de6bc3af215c6ed7732dfc097d1e238503
Merge: d49f8a52b15b de7d83da84bd
Author: Linus Torvalds
Subject: Re: [PATCH v2] block: fix rdma queue mapping
Sagi - From what I can tell, i40iw is also exposed to this same issue
if the IRQ affinity is configured by user.
managed affinity does not allow setting smp_affinity from userspace.
OK. But our device IRQs are not set-up as to be
Sagi - From what I can tell, i40iw is also exposed to this same issue if the
IRQ affinity
is configured by user.
managed affinity does not allow setting smp_affinity from userspace.
Christoph, Sagi: it seems you think /proc/irq/$IRP/smp_affinity
shouldn't be allowed if drivers support managed affinity. Is that correct?
Not just shouldn't, but simply can't.
But as it stands, things are just plain borked if an rdma driver
supports ib_get_vector_affinity() yet the admin
Christoph, Sagi: it seems you think /proc/irq/$IRP/smp_affinity
shouldn't be allowed if drivers support managed affinity. Is that correct?
Not just shouldn't, but simply can't.
But as it stands, things are just plain borked if an rdma driver
supports ib_get_vector_affinity() yet the admin
Can we please make forward progress on this?
Christoph, Sagi: it seems you think /proc/irq/$IRP/smp_affinity
shouldn't be allowed if drivers support managed affinity. Is that
correct?
Perhaps that can be codified and be a way forward? IE: Somehow allow
the admin to choose either "managed
Can we please make forward progress on this?
Christoph, Sagi: it seems you think /proc/irq/$IRP/smp_affinity
shouldn't be allowed if drivers support managed affinity. Is that correct?
Perhaps that can be codified and be a way forward? IE: Somehow allow
the admin to choose either "managed
Hi Sagi,
did you have a chance to look on Israel's and mine fixes that we
attached to the first thread ?
there are few issues with this approach. For example in case you don't
have a "free" cpu in the mask for Qi and you take cpu from Qi+j mask.
Also in case we have a non-symetrical
nvme-rdma attempts to map queues based on irq vector affinity.
However, for some devices, completion vector irq affinity is
configurable by the user which can break the existing assumption
that irq vectors are optimally arranged over the host cpu cores.
IFF affinity is configurable we should
I guess this way still can't fix the request allocation crash issue
triggered by using blk_mq_alloc_request_hctx(), in which one hw queue may
not be mapped from any online CPU.
Not really. I guess we will need to simply skip queues that are
mapped to an offline cpu.
Maybe this patch isn't
, we map it to any unmapped
cpu we can find. Then, map the remaining cpus in the possible cpumap
naively.
Tested-by: Steve Wise
Signed-off-by: Sagi Grimberg
---
Changes from v1:
- fixed double semicolon typo
block/blk-mq-cpumap.c | 39 +++-
block/blk-mq-rdma.c| 80
, we map it to any unmapped
cpu we can find. Then, map the remaining cpus in the possible cpumap
naively.
Tested-by: Steve Wise
Signed-off-by: Sagi Grimberg
---
See thread that started here:
https://marc.info/?l=linux-rdma=153172982318299=2
block/blk-mq-cpumap.c | 39
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
+u16 nvmet_zero_sgl(struct nvmet_req *req, off_t off, size_t len)
Would personally prefer nvmet_clear_sgl, but not hard headed on it,
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
static void __blk_mq_requeue_request(struct request *rq)
{
struct request_queue *q = rq->q;
+ enum mq_rq_state old_state = blk_mq_rq_state(rq);
blk_mq_put_driver_tag(rq);
trace_block_rq_requeue(q, rq);
wbt_requeue(q->rq_wb, >issue_stat);
- if
Ah, yeah, I was moving it out of add_timer but forgot to actully add
it to the issue path. Fixed patch below.
BTW, no matter what we do w/ the request handover between normal and
timeout paths, we'd need something similar. Otherwise, while we can
reduce the window, we can't get rid of it.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75336848f7a7..81ced3096433 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -444,6 +444,10 @@ struct request *blk_mq_alloc_request_hctx(struct
request_queue *q,
return ERR_PTR(-EXDEV);
}
cpu =
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
ITE_ONCE(rq->gstate, new_val);
+ return cmpxchg(>__deadline, old_val, new_val) == old_val;
}
Can you explain why this takes the old_state of the request?
Otherwise this looks good to me,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
The blk-mq timeout handling code ignores completions that occur after
blk_mq_check_expired() has been called and before blk_mq_rq_timed_out()
has reset rq->aborted_gstate. If a block driver timeout handler always
returns BLK_EH_RESET_TIMER then the result will be that the request
never
Looks good,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
Looks good,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
Looks good,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
Looks good,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
Looks good,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
Looks good,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
Looks good,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
Looks good,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
Hi Sagi
Sorry for the late response, bellow patch works, here is the full log:
Thanks for testing!
Now that we isolated the issue, the question is if this fix is correct
given that we are guaranteed that the connect context will run on an
online cpu?
another reference to the patch (we can
If a completion occurs after blk_mq_rq_timed_out() has reset
rq->aborted_gstate and the request is again in flight when the timeout
expires then a request will be completed twice: a first time by the
timeout handler and a second time when the regular completion occurs.
Additionally, the blk-mq
My device exposes nr_hw_queues which is not higher than num_online_cpus
so I want to connect all hctxs with hope that they will be used.
The issue is that CPU online & offline can happen any time, and after
blk-mq removes CPU hotplug handler, there is no way to remap queue
when CPU topo is
On 04/08/2018 03:57 PM, Ming Lei wrote:
On Sun, Apr 08, 2018 at 02:53:03PM +0300, Sagi Grimberg wrote:
Hi Sagi
Still can reproduce this issue with the change:
Thanks for validating Yi,
Would it be possible to test the following:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index
Hi Sagi
Still can reproduce this issue with the change:
Thanks for validating Yi,
Would it be possible to test the following:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75336848f7a7..81ced3096433 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -444,6 +444,10 @@ struct request
Hi Sagi
Still can reproduce this issue with the change:
Thanks for validating Yi,
Would it be possible to test the following:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75336848f7a7..81ced3096433 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -444,6 +444,10 @@ struct request
Hi Sagi
Still can reproduce this issue with the change:
Thanks for validating Yi,
Would it be possible to test the following:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75336848f7a7..81ced3096433 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -444,6 +444,10 @@ struct request
On 03/30/2018 12:32 PM, Yi Zhang wrote:
Hello
I got this kernel BUG on 4.16.0-rc7, here is the reproducer and log, let me
know if you need more info, thanks.
Reproducer:
1. setup target
#nvmetcli restore /etc/rdma.json
2. connect target on host
#nvme connect-all -t rdma -a $IP -s 4420during
- if (nvmeq->sq_cmds_io)
- memcpy_toio(>sq_cmds_io[tail], cmd, sizeof(*cmd));
- else
- memcpy(>sq_cmds[tail], cmd, sizeof(*cmd));
+ memcpy(>sq_cmds[tail], cmd, sizeof(*cmd));
Hmm, how safe is replacing memcpy_toio() with regular memcpy()? On PPC
On 01/03/18 04:03 AM, Sagi Grimberg wrote:
Can you describe what would be the plan to have it when these devices
do come along? I'd say that p2p_dev needs to become a nvmet_ns reference
and not from nvmet_ctrl. Then, when cmb capable devices come along, the
ns can prefer to use its own cmb
Looks fine,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
.
This looks fine to me,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
Any plans adding the capability to nvme-rdma? Should be
straight-forward... In theory, the use-case would be rdma backend
fabric behind. Shouldn't be hard to test either...
We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch as the RDMA port and all the block devices in use. If
the user enabled it an no
Looks fine,
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
On 03/01/2018 01:40 AM, Logan Gunthorpe wrote:
In order to use PCI P2P memory pci_p2pmem_[un]map_sg() functions must be
called to map the correct DMA address. To do this, we add a flags
variable and the RDMA_RW_CTX_FLAG_PCI_P2P flag. When the flag is
specified use the appropriate map function.
Hi Everyone,
Hi Logan,
Here's v2 of our series to introduce P2P based copy offload to NVMe
fabrics. This version has been rebased onto v4.16-rc3 which already
includes Christoph's devpagemap work the previous version was based
off as well as a couple of the cleanup patches that were in v1.
Hey Bart,
+void irq_poll_init_am(struct irq_poll *iop, unsigned int nr_events,
+unsigned short nr_levels, unsigned short start_level, irq_poll_am_fn
*amfn)
+{
+ iop->amfn = amfn;
+ irq_am_init(>am, nr_events, nr_levels, start_level, irq_poll_am);
+}
This function has a
I think it'd be simpler to have blk_poll set it back to running if
need_resched is true rather than repeat this patter across all the
callers:
---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index df93102e2149..40285fe1c8ad 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3164,6 +3164,7
Hi Tal,
I think Tal has idea/s on how the existing library can be changed to
support more modes/models
What I was thinking is allowing DIM algorithm to disregard data which is
0. Currently if bytes == 0 we return "SAME" immediately. We can change
it to simply move to the packets check (which
Hey Or,
not any more, Andy and Tal made the mlx5 AM code a kernel library
which is called DIM
f4e5f0e MAINTAINERS: add entry for Dynamic Interrupt Moderation
6a8788f bnxt_en: add support for software dynamic interrupt moderation
8115b75 net/dim: use struct net_dim_sample as arg to net_dim
/decrement its current level (step) and schedules a program
dispatch work.
Signed-off-by: Sagi Grimberg <s...@grimberg.me>
---
include/linux/irq-am.h | 116 +++
lib/Kconfig| 5 ++
lib/Makefile | 1 +
lib/irq-am.c
Useful for local debugging
Signed-off-by: Sagi Grimberg <s...@grimberg.me>
---
include/linux/irq-am.h | 2 +
lib/irq-am.c | 109 +
2 files changed, 111 insertions(+)
diff --git a/include/linux/irq-am.h b/include/linux/irq-am.h
Update online stats for fired event and completions on
each poll cycle.
Also expose am initialization interface. The irqpoll consumer will
initialize the irq-am context of the irq-poll context.
Signed-off-by: Sagi Grimberg <s...@grimberg.me>
---
include/linux/irq_poll.h | 9
Currently activated via modparam, obviously we will want to
find a more generic way to control this.
Signed-off-by: Sagi Grimberg <s...@grimberg.me>
---
drivers/infiniband/core/cq.c | 48
1 file changed, 48 insertions(+)
diff --git a/d
is interested :)
Sagi Grimberg (5):
irq-am: Introduce helper library for adaptive moderation
implementation
irq-am: add some debugfs exposure on tuning state
irq_poll: wire up irq_am and allow to initialize it
IB/cq: add adaptive moderation support
IB/cq: wire up adaptive moderation
Signed-off-by: Sagi Grimberg <s...@grimberg.me>
---
drivers/infiniband/core/cq.c | 25 -
include/rdma/ib_verbs.h | 9 +++--
2 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
Hi Bart,
My another 2 cents:)
On Fri, Feb 2, 2018 at 6:05 PM, Bart Van Assche
wrote:
On Fri, 2018-02-02 at 15:08 +0100, Roman Pen wrote:
o Simple configuration of IBNBD:
- Server side is completely passive: volumes do not need to be
explicitly exported.
1 - 100 of 434 matches
Mail list logo