Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-04-07 Thread Logan Gunthorpe
Hi Dan,

On 03/04/17 06:07 PM, Dan Williams wrote:
> The completely agnostic part is where I get worried, but I shouldn't
> say anymore until I actually read the patch.The worry is cases where
> this agnostic enabling allows unsuspecting code paths to do the wrong
> thing. Like bypass iomem safety.

Yup, you're right the iomem safety issue is a really difficult problem.
I think replacing struct page with pfn_t in a bunch of places is
probably going to be a requirement for my work. However, this is going
to be a very large undertaking.

I've done an audit of sg_page users and there will indeed be some
difficult cases. However, I'm going to start doing some cleanup and
semantic changes to hopefully move in that direction. The first step
I've chosen to look at is to create an sg_kmap interface which replaces
about 77 (out of ~340) sg_page users. I'm hoping the new interface can
have the semantic that sg_kmap can fail (which would happen in the case
that no suitable page exists).

Eventually, I'd want to get to a place where sg_page either doesn't
exists or can fail and is always checked. At that point swapping out
pfn_t in the sgl would be manageable.

Thoughts?

Logan


Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-04-03 Thread Dan Williams
On Mon, Apr 3, 2017 at 4:12 PM, Logan Gunthorpe  wrote:
>
>
> On 03/04/17 04:47 PM, Dan Williams wrote:
>> I wouldn't necessarily conflate supporting pfn_t in the scatterlist
>> with the stalled stuct-page-less DMA effor. A pfn_t_to_page()
>> conversion will still work and be required. However you're right, the
>> minute we use pfn_t for this we're into the realm of special case
>> drivers that understand scatterlists with special "I/O-pfn_t" entries.
>
> Well yes, it would certainly be possible to convert the scatterlist code
> from page_link to pfn_t. (The only slightly tricky thing is that
> scatterlist uses extra chaining bits and pfn_t uses extra flag bits so
> they'd have to be harmonized somehow). But if we aren't moving toward
> struct-page-less DMA, I fail to see the point of the conversion.
>
> I'll definitely need IO scatterlists of some form or another and I like
> pfn_t but right now it just seems like extra work with unclear benefit.
> (Though, if someone told me that I can't use a third bit in the
> page_link field then maybe that would be a good reason to move to pfn_t.)
>
>> However, maybe that's what we want? I think peer-to-peer DMA is not a
>> general purpose feature unless/until we get it standardized in PCI. So
>> maybe drivers with special case scatterlist support is exactly what we
>> want for now.
>
> Well, I think this should be completely independent from PCI code. I see
> no reason why we can't have infrastructure for DMA on iomem from any
> bus. Largely all the work I've done in this area is completely agnostic
> to the bus in use. (Except for any kind of white/black list when it is
> used.)

The completely agnostic part is where I get worried, but I shouldn't
say anymore until I actually read the patch.The worry is cases where
this agnostic enabling allows unsuspecting code paths to do the wrong
thing. Like bypass iomem safety.

> The "special case scatterlist" is essentially what I'm proposing in the
> patch I sent upthread, it just stores the flag in the page_link instead
> of in a pfn_t.

Makes sense. The suggestion of pfn_t was to try to get more type
safety throughout the stack. So that, again, unsuspecting code paths
that get an I/O pfn aren't able to do things like page_address() or
kmap() without failing.

I'll stop commenting now and set aside some time to go read the patches.


Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-04-03 Thread Logan Gunthorpe


On 03/04/17 04:47 PM, Dan Williams wrote:
> I wouldn't necessarily conflate supporting pfn_t in the scatterlist
> with the stalled stuct-page-less DMA effor. A pfn_t_to_page()
> conversion will still work and be required. However you're right, the
> minute we use pfn_t for this we're into the realm of special case
> drivers that understand scatterlists with special "I/O-pfn_t" entries.

Well yes, it would certainly be possible to convert the scatterlist code
from page_link to pfn_t. (The only slightly tricky thing is that
scatterlist uses extra chaining bits and pfn_t uses extra flag bits so
they'd have to be harmonized somehow). But if we aren't moving toward
struct-page-less DMA, I fail to see the point of the conversion.

I'll definitely need IO scatterlists of some form or another and I like
pfn_t but right now it just seems like extra work with unclear benefit.
(Though, if someone told me that I can't use a third bit in the
page_link field then maybe that would be a good reason to move to pfn_t.)

> However, maybe that's what we want? I think peer-to-peer DMA is not a
> general purpose feature unless/until we get it standardized in PCI. So
> maybe drivers with special case scatterlist support is exactly what we
> want for now.

Well, I think this should be completely independent from PCI code. I see
no reason why we can't have infrastructure for DMA on iomem from any
bus. Largely all the work I've done in this area is completely agnostic
to the bus in use. (Except for any kind of white/black list when it is
used.)

The "special case scatterlist" is essentially what I'm proposing in the
patch I sent upthread, it just stores the flag in the page_link instead
of in a pfn_t.

Logan


Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-04-03 Thread Dan Williams
On Mon, Apr 3, 2017 at 3:10 PM, Logan Gunthorpe  wrote:
>
>
> On 03/04/17 03:44 PM, Dan Williams wrote:
>> On Mon, Apr 3, 2017 at 2:20 PM, Logan Gunthorpe  wrote:
>>> Hi Christoph,
>>>
>>> What are your thoughts on an approach like the following untested
>>> draft patch.
>>>
>>> The patch (if fleshed out) makes it so iomem can be used in an sgl
>>> and WARN_ONs will occur in places where drivers attempt to access
>>> iomem directly through the sgl.
>>>
>>> I'd also probably create a p2pmem_alloc_sgl helper function so driver
>>> writers wouldn't have to mess with sg_set_iomem_page.
>>>
>>> With all that in place, it should be relatively safe for drivers to
>>> implement p2pmem even though we'd still technically be violating the
>>> __iomem boundary in some places.
>>
>> Just reacting to this mail, I still haven't had a chance to take a
>> look at the rest of the series.
>>
>> The pfn_t type was invented to carry extra type and page lookup
>> information about the memory behind a given pfn. At first glance that
>> seems a more natural place to carry an indication that this is an
>> "I/O" pfn.
>
> I agree... But what are the plans for pfn_t? Is anyone working on using
> it in the scatterlist code? Currently it's not there yet and given the
> assertion that we will continue to be using struct page for DMA is that
> a direction we'd want to go?
>

I wouldn't necessarily conflate supporting pfn_t in the scatterlist
with the stalled stuct-page-less DMA effor. A pfn_t_to_page()
conversion will still work and be required. However you're right, the
minute we use pfn_t for this we're into the realm of special case
drivers that understand scatterlists with special "I/O-pfn_t" entries.
However, maybe that's what we want? I think peer-to-peer DMA is not a
general purpose feature unless/until we get it standardized in PCI. So
maybe drivers with special case scatterlist support is exactly what we
want for now.

Thoughts?


Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-04-03 Thread Logan Gunthorpe


On 03/04/17 03:44 PM, Dan Williams wrote:
> On Mon, Apr 3, 2017 at 2:20 PM, Logan Gunthorpe  wrote:
>> Hi Christoph,
>>
>> What are your thoughts on an approach like the following untested
>> draft patch.
>>
>> The patch (if fleshed out) makes it so iomem can be used in an sgl
>> and WARN_ONs will occur in places where drivers attempt to access
>> iomem directly through the sgl.
>>
>> I'd also probably create a p2pmem_alloc_sgl helper function so driver
>> writers wouldn't have to mess with sg_set_iomem_page.
>>
>> With all that in place, it should be relatively safe for drivers to
>> implement p2pmem even though we'd still technically be violating the
>> __iomem boundary in some places.
> 
> Just reacting to this mail, I still haven't had a chance to take a
> look at the rest of the series.
> 
> The pfn_t type was invented to carry extra type and page lookup
> information about the memory behind a given pfn. At first glance that
> seems a more natural place to carry an indication that this is an
> "I/O" pfn.

I agree... But what are the plans for pfn_t? Is anyone working on using
it in the scatterlist code? Currently it's not there yet and given the
assertion that we will continue to be using struct page for DMA is that
a direction we'd want to go?

Logan


Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-04-03 Thread Dan Williams
On Mon, Apr 3, 2017 at 2:20 PM, Logan Gunthorpe  wrote:
> Hi Christoph,
>
> What are your thoughts on an approach like the following untested
> draft patch.
>
> The patch (if fleshed out) makes it so iomem can be used in an sgl
> and WARN_ONs will occur in places where drivers attempt to access
> iomem directly through the sgl.
>
> I'd also probably create a p2pmem_alloc_sgl helper function so driver
> writers wouldn't have to mess with sg_set_iomem_page.
>
> With all that in place, it should be relatively safe for drivers to
> implement p2pmem even though we'd still technically be violating the
> __iomem boundary in some places.

Just reacting to this mail, I still haven't had a chance to take a
look at the rest of the series.

The pfn_t type was invented to carry extra type and page lookup
information about the memory behind a given pfn. At first glance that
seems a more natural place to carry an indication that this is an
"I/O" pfn.


Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-04-03 Thread Logan Gunthorpe
Hi Christoph,

What are your thoughts on an approach like the following untested
draft patch.

The patch (if fleshed out) makes it so iomem can be used in an sgl
and WARN_ONs will occur in places where drivers attempt to access
iomem directly through the sgl.

I'd also probably create a p2pmem_alloc_sgl helper function so driver
writers wouldn't have to mess with sg_set_iomem_page.

With all that in place, it should be relatively safe for drivers to
implement p2pmem even though we'd still technically be violating the
__iomem boundary in some places.

Logan


commit b435a154a4ec4f82766f6ab838092c3c5a9388ac
Author: Logan Gunthorpe 
Date:   Wed Feb 8 12:44:52 2017 -0700

scatterlist: Add support for iomem pages

This patch steals another bit from the page_link field to indicate the
sg points to iomem. In sg_copy_buffer we use this flag to select
between memcpy and iomemcpy. Other sg_miter users will get an WARN_ON
unless they indicate they support iomemory by setting the
SG_MITER_IOMEM flag.

Also added are sg_kmap functions which would replace a common pattern
of kmap(sg_page(sg)). These new functions then also warn if the caller
tries to map io memory. Another option may be to automatically copy
the iomem to a new page and return that transparently to the driver.

Another coccinelle patch would then be done to convert kmap(sg_page(sg))
instances to the appropriate sg_kmap calls.

Signed-off-by: Logan Gunthorpe 

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 0007b79..bd690a2c 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -37,6 +37,9 @@

 #include 

+/* Avoid the highmem.h macro from aliasing our ops->kunmap_atomic */
+#undef kunmap_atomic
+
 static inline int is_dma_buf_file(struct file *);

 struct dma_buf_list {
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index cb3c8fe..7608da0 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 struct scatterlist {
@@ -53,6 +54,9 @@ struct sg_table {
  *
  * If bit 1 is set, then this sg entry is the last element in a list.
  *
+ * We also use bit 2 to indicate whether the page_link points to an
+ * iomem page or not.
+ *
  * See sg_next().
  *
  */
@@ -64,10 +68,17 @@ struct sg_table {
  * a valid sg entry, or whether it points to the start of a new
scatterlist.
  * Those low bits are there for everyone! (thanks mason :-)
  */
-#define sg_is_chain(sg)((sg)->page_link & 0x01)
-#define sg_is_last(sg) ((sg)->page_link & 0x02)
+#define PAGE_LINK_MASK 0x7
+#define PAGE_LINK_CHAIN0x1
+#define PAGE_LINK_LAST 0x2
+#define PAGE_LINK_IOMEM0x4
+
+#define sg_is_chain(sg)((sg)->page_link & PAGE_LINK_CHAIN)
+#define sg_is_last(sg) ((sg)->page_link & PAGE_LINK_LAST)
 #define sg_chain_ptr(sg)   \
-   ((struct scatterlist *) ((sg)->page_link & ~0x03))
+   ((struct scatterlist *) ((sg)->page_link & ~(PAGE_LINK_CHAIN | \
+PAGE_LINK_LAST)))
+#define sg_is_iomem(sg)((sg)->page_link & PAGE_LINK_IOMEM)

 /**
  * sg_assign_page - Assign a given page to an SG entry
@@ -81,13 +92,13 @@ struct sg_table {
  **/
 static inline void sg_assign_page(struct scatterlist *sg, struct page
*page)
 {
-   unsigned long page_link = sg->page_link & 0x3;
+   unsigned long page_link = sg->page_link & PAGE_LINK_MASK;

/*
 * In order for the low bit stealing approach to work, pages
-* must be aligned at a 32-bit boundary as a minimum.
+* must be aligned at a 64-bit boundary as a minimum.
 */
-   BUG_ON((unsigned long) page & 0x03);
+   BUG_ON((unsigned long) page & PAGE_LINK_MASK);
 #ifdef CONFIG_DEBUG_SG
BUG_ON(sg->sg_magic != SG_MAGIC);
BUG_ON(sg_is_chain(sg));
@@ -117,13 +128,56 @@ static inline void sg_set_page(struct scatterlist
*sg, struct page *page,
sg->length = len;
 }

+/**
+ * sg_set_page - Set sg entry to point at given iomem page
+ * @sg: SG entry
+ * @page:   The page
+ * @len:Length of data
+ * @offset: Offset into page
+ *
+ * Description:
+ *   Same as sg_set_page but used when the page is a ZONE_DEVICE page that
+ *   points to IO memory.
+ *
+ **/
+static inline void sg_set_iomem_page(struct scatterlist *sg, struct
page *page,
+unsigned int len, unsigned int offset)
+{
+   sg_set_page(sg, page, len, offset);
+   sg->page_link |= PAGE_LINK_IOMEM;
+}
+
 static inline struct page *sg_page(struct scatterlist *sg)
 {
 #ifdef CONFIG_DEBUG_SG
BUG_ON(sg->sg_magic != SG_MAGIC);
BUG_ON(sg_is_chain(sg));
 #endif
-   return (struct page *)((sg)->page_link & ~0x3);
+   return (struct page *)((sg)->page_link & 

Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-03-31 Thread Logan Gunthorpe


On 31/03/17 01:09 AM, Christoph Hellwig wrote:
> You're calling memcpy_{to,from}_iomem on non-__iomem pointers.  This
> is a fundamental no-go as we keep I/O memory separate from kernel
> pointers.

Yes, that's true, however I don't know how we could get around that when
the iomem is referenced by struct pages inside a scatter gather list. Do
we need to now have special __iomem sgls? And even still, I'm not sure
how that could work when the nvme target code is using the same sgls to
sometimes point to iomem and sometimes point to regular memory.

I'm certainly open to suggestions, though.

Logan



Re: [RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-03-31 Thread Christoph Hellwig
You're calling memcpy_{to,from}_iomem on non-__iomem pointers.  This
is a fundamental no-go as we keep I/O memory separate from kernel
pointers.


[RFC 5/8] scatterlist: Modify SG copy functions to support io memory.

2017-03-30 Thread Logan Gunthorpe
Now that we are using p2pmem SG buffers we occasionally have to copy
to and from this memory. For this, we add an iomem flag to
sg_copy_buffer for copying with iomemcpy. We also add the sg_iocopy_
variants to use this more easily.

Signed-off-by: Logan Gunthorpe 
Signed-off-by: Stephen Bates 
Signed-off-by: Steve Wise 
---
 drivers/scsi/scsi_debug.c   |  7 ++---
 include/linux/scatterlist.h |  7 -
 lib/scatterlist.c   | 64 ++---
 3 files changed, 65 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 17249c3..70c0d9f 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -1309,7 +1309,7 @@ static int resp_inquiry(struct scsi_cmnd *scp, struct 
sdebug_dev_info *devip)
int lu_id_num, port_group_id, target_dev_id, len;
char lu_id_str[6];
int host_no = devip->sdbg_host->shost->host_no;
-   
+
port_group_id = (((host_no + 1) & 0x7f) << 8) +
(devip->channel & 0x7f);
if (sdebug_vpd_use_hostno == 0)
@@ -2381,14 +2381,15 @@ static int do_device_access(struct scsi_cmnd *scmd, u64 
lba, u32 num,
 
ret = sg_copy_buffer(sdb->table.sgl, sdb->table.nents,
   fake_storep + (block * sdebug_sector_size),
-  (num - rest) * sdebug_sector_size, 0, do_write);
+  (num - rest) * sdebug_sector_size, 0, do_write, false);
if (ret != (num - rest) * sdebug_sector_size)
return ret;
 
if (rest) {
ret += sg_copy_buffer(sdb->table.sgl, sdb->table.nents,
fake_storep, rest * sdebug_sector_size,
-   (num - rest) * sdebug_sector_size, do_write);
+   (num - rest) * sdebug_sector_size, do_write,
+ false);
}
 
return ret;
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index cb3c8fe..030b92b 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -267,7 +267,7 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
gfp_t gfp_mask);
 
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
- size_t buflen, off_t skip, bool to_buffer);
+ size_t buflen, off_t skip, bool to_buffer, bool iomem);
 
 size_t sg_copy_from_buffer(struct scatterlist *sgl, unsigned int nents,
   const void *buf, size_t buflen);
@@ -279,6 +279,11 @@ size_t sg_pcopy_from_buffer(struct scatterlist *sgl, 
unsigned int nents,
 size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
  void *buf, size_t buflen, off_t skip);
 
+size_t sg_iocopy_from_buffer(struct scatterlist *sgl, unsigned int nents,
+const void *buf, size_t buflen);
+size_t sg_iocopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
+  void *buf, size_t buflen);
+
 /*
  * Maximum number of entries that will be allocated in one piece, if
  * a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c6cf822..22abd94 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -647,7 +647,7 @@ EXPORT_SYMBOL(sg_miter_stop);
  *
  **/
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
- size_t buflen, off_t skip, bool to_buffer)
+ size_t buflen, off_t skip, bool to_buffer, bool iomem)
 {
unsigned int offset = 0;
struct sg_mapping_iter miter;
@@ -668,10 +668,17 @@ size_t sg_copy_buffer(struct scatterlist *sgl, unsigned 
int nents, void *buf,
 
len = min(miter.length, buflen - offset);
 
-   if (to_buffer)
-   memcpy(buf + offset, miter.addr, len);
-   else
-   memcpy(miter.addr, buf + offset, len);
+   if (iomem) {
+   if (to_buffer)
+   memcpy_fromio(buf + offset,  miter.addr, len);
+   else
+   memcpy_toio(miter.addr, buf + offset, len);
+   } else {
+   if (to_buffer)
+   memcpy(buf + offset, miter.addr, len);
+   else
+   memcpy(miter.addr, buf + offset, len);
+   }
 
offset += len;
}
@@ -695,7 +702,8 @@ EXPORT_SYMBOL(sg_copy_buffer);
 size_t sg_copy_from_buffer(struct scatterlist *sgl, unsigned int nents,
   const void *buf, size_t buflen)
 {
-   return sg_copy_buffer(sgl, nents, (void *)buf, buflen, 0, false);
+   return sg_copy_buffer(sgl, nents,