On 9/27/18 4:55 PM, Omar Sandoval wrote:
> From: Omar Sandoval
>
> Hi,
>
> This is my series to improve the heuristics used by Kyber. Patches 1 and
> 2 are preparation. Patch 3 is a minor optimization. Patch 4 is the main
> change, and includes a detailed description of the new heuristics.
On Tue, 2018-09-18 at 17:18 -0700, Omar Sandoval wrote:
> On Tue, Sep 18, 2018 at 05:02:47PM -0700, Bart Van Assche wrote:
> > On 9/18/18 4:24 PM, Omar Sandoval wrote:
> > > On Tue, Sep 18, 2018 at 02:20:59PM -0700, Bart Van Assche wrote:
> > > > Can you have a look at the updated master branch of
From: Omar Sandoval
Hi,
This is my series to improve the heuristics used by Kyber. Patches 1 and
2 are preparation. Patch 3 is a minor optimization. Patch 4 is the main
change, and includes a detailed description of the new heuristics. Patch
5 adds tracepoints for debugging. This is basically
From: Omar Sandoval
Commit 4bc6339a583c ("block: move blk_stat_add() to
__blk_mq_end_request()") consolidated some calls using ktime_get() so
we'd only need to call it once. Kyber's ->completed_request() hook also
calls ktime_get(), so let's move it to the same place, too.
Signed-off-by: Omar
From: Omar Sandoval
Kyber will need this in a future change if it is built as a module.
Signed-off-by: Omar Sandoval
---
block/blk-stat.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/blk-stat.c b/block/blk-stat.c
index 7587b1c3caaf..90561af85a62 100644
--- a/block/blk-stat.c
+++
From: Omar Sandoval
When debugging Kyber, it's really useful to know what latencies we've
been having, how the domain depths have been adjusted, and if we've
actually been throttling. Add three tracepoints, kyber_latency,
kyber_adjust, and kyber_throttled, to record that.
Signed-off-by: Omar
From: Omar Sandoval
Kyber's current heuristics have a few flaws:
- It's based on the mean latency, but p99 latency tends to be more
meaningful to anyone who cares about latency. The mean can also be
skewed by rare outliers that the scheduler can't do anything about.
- The statistics
From: Omar Sandoval
The domain token sbitmaps are currently initialized to the device queue
depth or 256, whichever is larger, and immediately resized to the
maximum depth for that domain (256, 128, or 64 for read, write, and
other, respectively). The sbitmap is never resized larger than that,
On 9/26/18 6:35 AM, Ilya Dryomov wrote:
> trace_block_unplug() takes true for explicit unplugs and false for
> implicit unplugs. schedule() unplugs are implicit and should be
> reported as timer unplugs. While correct in the legacy code, this has
> been inverted in blk-mq since 4.11.
That's
On Wed, Sep 26, 2018 at 02:35:50PM +0200, Ilya Dryomov wrote:
> trace_block_unplug() takes true for explicit unplugs and false for
> implicit unplugs. schedule() unplugs are implicit and should be
> reported as timer unplugs. While correct in the legacy code, this has
> been inverted in blk-mq
On Fri, 2018-09-21 at 07:48 +0200, Christoph Hellwig wrote:
> Can you resend this with the one easy fixup pointed out? It would
> be good to finally get the race fix merged.
Seconded. I also would like to see these patches being merged upstream.
Bart.
On 2018-09-27 11:12 AM, Keith Busch wrote:
> Reviewed-by: Keith Busch
Thanks for the reviews Keith!
Logan
On Thu, Sep 27, 2018 at 10:54:20AM -0600, Logan Gunthorpe wrote:
> We create a configfs attribute in each nvme-fabrics target port to
> enable p2p memory use. When enabled, the port will only then use the
> p2p memory if a p2p memory device can be found which is behind the
> same switch hierarchy
On Thu, Sep 27, 2018 at 10:54:16AM -0600, Logan Gunthorpe wrote:
> Register the CMB buffer as p2pmem and use the appropriate allocation
> functions to create and destroy the IO submission queues.
>
> If the CMB supports WDS and RDS, publish it for use as P2P memory
> by other devices.
>
>
On Thu, Sep 27, 2018 at 10:54:17AM -0600, Logan Gunthorpe wrote:
> For P2P requests, we must use the pci_p2pmem_map_sg() function
> instead of the dma_map_sg functions.
>
> With that, we can then indicate PCI_P2P support in the request queue.
> For this, we create an NVME_F_PCI_P2P flag which
On Thu, Sep 27, 2018 at 10:54:18AM -0600, Logan Gunthorpe wrote:
> Introduce a quirk to use CMB-like memory on older devices that have
> an exposed BAR but do not advertise support for using CMBLOC and
> CMBSIZE.
>
> We'd like to use some of these older cards to test P2P memory.
>
>
Add helpers to allocate and free the SGL in a struct nvmet_req:
int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq)
void nvmet_req_free_sgl(struct nvmet_req *req)
This will be expanded in a future patch to implement peer-to-peer
memory DMAs and should be common with all target
The DMA address used when mapping PCI P2P memory must be the PCI bus
address. Thus, introduce pci_p2pmem_map_sg() to map the correct
addresses when using P2P memory.
Memory mapped in this way does not need to be unmapped and thus if we
provided pci_p2pmem_unmap_sg() it would be empty. This breaks
In order to use PCI P2P memory the pci_p2pmem_map_sg() function must be
called to map the correct PCI bus address.
To do this, check the first page in the scatter list to see if it is P2P
memory or not. At the moment, scatter lists that contain P2P memory must
be homogeneous so if the first page
Add a sysfs group to display statistics about P2P memory that is
registered in each PCI device.
Attributes in the group display the total amount of P2P memory, the
amount available and whether it is published or not.
Signed-off-by: Logan Gunthorpe
Acked-by: Bjorn Helgaas
---
QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
supports targeting P2P memory. This will be used by P2P providers and
orchestrators (in subsequent patches) to ensure block devices can
support P2P memory before submitting P2P backed pages to submit_bio().
Signed-off-by: Logan
Add a restructured text file describing how to write drivers
with support for P2P DMA transactions. The document describes
how to use the APIs that were added in the previous few
commits.
Also adds an index for the PCI documentation tree even though this
is the only PCI document that has been
We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch hierarchy as the RDMA port and all the block devices in
use. If the user enabled it
Introduce a quirk to use CMB-like memory on older devices that have
an exposed BAR but do not advertise support for using CMBLOC and
CMBSIZE.
We'd like to use some of these older cards to test P2P memory.
Signed-off-by: Logan Gunthorpe
Reviewed-by: Sagi Grimberg
---
drivers/nvme/host/nvme.h |
Add a new directory in the driver API guide for PCI specific
documentation.
This is in preparation for adding a new PCI P2P DMA driver writers
guide which will go in this directory.
Signed-off-by: Logan Gunthorpe
Cc: Jonathan Corbet
Cc: Mauro Carvalho Chehab
Cc: Greg Kroah-Hartman
Cc: Vinod
Users of the P2PDMA infrastructure will typically need a way for
the user to tell the kernel to use P2P resources. Typically
this will be a simple on/off boolean operation but sometimes
it may be desirable for the user to specify the exact device to
use for the P2P operation.
Add new helpers for
Hi Everyone,
Here is version 6 of the PCI P2PDMA patch set. This version makes
a few minor changes from v6 and is based on v4.19-rc5. A git repo is here:
https://github.com/sbates130272/linux-p2pmem pci-p2p-v7
Now that we have Bjorn's Acks, I'd preferably like to get Jens's Ack for
Patch 7 and
Some PCI devices may have memory mapped in a BAR space that's
intended for use in peer-to-peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.
Add an interface for other subsystems to
Register the CMB buffer as p2pmem and use the appropriate allocation
functions to create and destroy the IO submission queues.
If the CMB supports WDS and RDS, publish it for use as P2P memory
by other devices.
Kernels without CONFIG_PCI_P2PDMA will also no longer support NVMe CMB.
However,
For P2P requests, we must use the pci_p2pmem_map_sg() function
instead of the dma_map_sg functions.
With that, we can then indicate PCI_P2P support in the request queue.
For this, we create an NVME_F_PCI_P2P flag which tells the core to
set QUEUE_FLAG_PCI_P2P in the request queue.
Signed-off-by:
On 9/27/18 9:57 AM, 国炬方 wrote:
> Yes, Guoju Fang. Thx. :)
OK, I made that change and committed it. Just be sure to use your full
name in the future for signoffs, etc.
--
Jens Axboe
On 9/27/18 9:41 AM, Coly Li wrote:
> From: guoju
This, and the signed-off, should use the full name. I can fix that up,
assuming Guoju Fang is the full name?
--
Jens Axboe
Hi Jens,
Guoju Fang just posts a bug fix to solve a bcache journal deadlock.
This bug probably happens when system memory is in heavy usage,
the deadlock is observed occasionally for a long while.
If it is too late to go into Linux 4.19, I will submit to you in
4.20 merge window, but it will be
From: guoju
After write SSD completed, bcache schedules journal_write work to
system_wq, which is a public workqueue in system, without WQ_MEM_RECLAIM
flag. system_wq is also a bound wq, and there may be no idle kworker on
current processor. Creating a new kworker may unfortunately need to
On Thu 27-09-18 20:35:27, Tetsuo Handa wrote:
> On 2018/09/27 20:27, Jan Kara wrote:
> > Hi,
> >
> > On Wed 26-09-18 00:26:49, Tetsuo Handa wrote:
> >> syzbot is reporting circular locking dependency between bdev->bd_mutex
> >> and lo->lo_ctl_mutex [1] which is caused by calling
loop_clr_fd() has a weird locking convention that is expects
loop_ctl_mutex held, releases it on success and keeps it on failure.
Untangle the mess by moving locking of loop_ctl_mutex into
loop_clr_fd().
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 49
Now that loop_ctl_mutex is global, just get rid of loop_index_mutex as
there is no good reason to keep these two separate and it just
complicates the locking.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 38 ++
1 file changed, 18 insertions(+), 20
Calling loop_reread_partitions() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire
__loop_release() has a single call site. Fold it there. This is
currently not a huge win but it will make following replacement of
loop_index_mutex more obvious.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 16 +++-
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git
Calling blkdev_reread_part() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire
Push loop_ctl_mutex down to loop_set_status(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 51 +--
1 file changed, 25 insertions(+), 26 deletions(-)
diff
Push lo_ctl_mutex down to loop_set_fd(). We will need this to be able to
call loop_reread_partitions() without lo_ctl_mutex.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/drivers/block/loop.c
The call of __blkdev_reread_part() from loop_reread_partition() happens
only when we need to invalidate partitions from loop_release(). Thus
move a detection for this into loop_clr_fd() and simplify
loop_reread_partition().
This makes loop_reread_partition() safe to use without loop_ctl_mutex
Push loop_ctl_mutex down to loop_change_fd(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 22 +++---
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/block/loop.c
Push acquisition of lo_ctl_mutex down into individual ioctl handling
branches. This is a preparatory step for pushing the lock down into
individual ioctl handling functions so that they can release the lock as
they need it. We also factor out some simple ioctl handlers that will
not need any
Hi,
this patch series fixes oops and possible deadlocks as reported by syzbot [1]
[2]. The second patch in the series (from Tetsuo) fixes the oops, the remaining
patches are cleaning up the locking in the loop driver so that we can in the
end reasonably easily switch to rereading partitions
Push loop_ctl_mutex down to loop_get_status() to avoid the unusual
convention that the function gets called with loop_ctl_mutex held and
releases it.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 37 ++---
1 file changed, 10 insertions(+), 27 deletions(-)
Move setting of lo_state to Lo_rundown out into the callers. That will
allow us to unlock loop_ctl_mutex while the loop device is protected
from other changes by its special state.
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 52 +++-
1 file
From: Tetsuo Handa
syzbot is reporting NULL pointer dereference [1] which is caused by
race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
loop devices at loop_validate_file() without holding corresponding
From: Tetsuo Handa
vfs_getattr() needs "struct path" rather than "struct file".
Let's use path_get()/path_put() rather than get_file()/fput().
Signed-off-by: Tetsuo Handa
Reviewed-by: Jan Kara
Signed-off-by: Jan Kara
---
drivers/block/loop.c | 10 +-
1 file changed, 5 insertions(+),
On 2018/09/27 20:27, Jan Kara wrote:
> Hi,
>
> On Wed 26-09-18 00:26:49, Tetsuo Handa wrote:
>> syzbot is reporting circular locking dependency between bdev->bd_mutex
>> and lo->lo_ctl_mutex [1] which is caused by calling blkdev_reread_part()
>> with lock held. We need to drop lo->lo_ctl_mutex in
Hi,
On Wed 26-09-18 00:26:49, Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency between bdev->bd_mutex
> and lo->lo_ctl_mutex [1] which is caused by calling blkdev_reread_part()
> with lock held. We need to drop lo->lo_ctl_mutex in order to fix it.
>
> This patch fixes it by
On Wed, Sep 26, 2018 at 11:24:55AM -0700, Bart Van Assche wrote:
> On Wed, 2018-09-26 at 17:06 +0200, Johannes Thumshirn wrote:
> > On Wed, Sep 26, 2018 at 04:57:32PM +0200, Christoph Hellwig wrote:
> > > I don't think this actually works given that rpm_status only exists
> > > if CONFIG_PM is
53 matches
Mail list logo