Re: iomapping a big endian area

2005-04-04 Thread Benjamin Herrenschmidt
On Sat, 2005-04-02 at 22:27 -0600, James Bottomley wrote: On Sat, 2005-04-02 at 20:08 -0800, David S. Miller wrote: Did anyone have a preference for the API? I was thinking ioread32_native, but ioread32be is fine too. I think doing foo{be,le}{8,16,32}() would be consistent with our

Re: [PATCH v2, part 1 3/9] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) instead

2013-05-15 Thread Benjamin Herrenschmidt
On Wed, 2013-05-15 at 22:46 +0800, Liu Jiang wrote: I don't know any OF exports, could you please help to CC some OF experts? I wrote that code I think. Sorry, I've missed the beginning of the thread, what is the problem ? Cheers, Ben. -- To unsubscribe from this list: send the line

SCSI breakage on non-cache coherent architectures

2007-11-18 Thread Benjamin Herrenschmidt
Hi James ! (Please CC me on replies as I'm not subscribed to linux-scsi) I've been debugging various issues on the PowerPC 44x embedded architecture which happens to have non-coherent PCI DMA. One of the problem I'm hitting is that one really need to enforce kmalloc alignement to cache lines or

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote: From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 16:35:23 +1100 I'm not sure what is the best way to fix that. Internally, I've done some test whacking some cacheline_aligned in the scsi_cmnd data structure

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 05:32 -0700, Matthew Wilcox wrote: On Mon, Nov 19, 2007 at 04:35:23PM +1100, Benjamin Herrenschmidt wrote: The other one I'm hitting now is that the SCSI layer nowadays embeds the 'nowadays'? It has always been so. Wasn't it kmalloc'ed at one point ? sense_buffer

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 09:09 -0600, James Bottomley wrote: What other drivers do is DMA to their own allocation and then memcpy to the sense buffer. There is a movement to allocate the sense data as its own sg list, but I don't think that patch has even been posted yet. I'd like to

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
I'd like to be rid of it inside the command for various reasons: every command has one of these, and they're expensive in the allocation (at 96 bytes). There's no reason we have to allocate and free that amount of space with every command. In theory, the number of these is bounded at the

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 13:43 -0800, Roland Dreier wrote: I've been debugging various issues on the PowerPC 44x embedded architecture which happens to have non-coherent PCI DMA. One of the problem I'm hitting is that one really need to enforce kmalloc alignement to cache lines or bad

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 14:31 -0800, David Miller wrote: From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Tue, 20 Nov 2007 06:51:14 +1100 On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote: From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 16:35:23 +1100

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 16:46 -0800, David Miller wrote: 1) Require that entire buffers are commited by call sites, and thus embedding DMA'd within non-DMA stuff isn't allowed 2) Add the __dma_cacheline_aligned tag. But note that with #2 it could get quite ugly because the alignment

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 18:10 -0800, Roland Dreier wrote: I wrapped this ugliness up inside the macro back in what I posted in 2002 (http://lkml.org/lkml/2002/6/12/234): #define __dma_buffer __dma_buffer_line(__LINE__) #define __dma_buffer_line(line) __dma_buffer_expand_line(line) #define

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
FYI, Here's what I have for the SCSI change. I haven't updated drivers to care for the new return code though, help appreciated with that as I don't know much about these drivers. Index: linux-work/drivers/scsi/scsi_error.c === ---

Re: SCSI breakage on non-cache coherent architectures

2007-11-20 Thread Benjamin Herrenschmidt
On Tue, 2007-11-20 at 15:10 -0600, James Bottomley wrote: We're talking about trying to fix this for 2.4; which is already at -rc3 ... Is an entire arch change for dma alignment really a merge candidate at this stage? Well, as I said before... it's a matter of what seems to be the less likely

[PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd

2007-12-20 Thread Benjamin Herrenschmidt
, which leads to various forms of corruption. This uses the newly defined __dma_buffer annotation to enforce that on such platforms, the sense_buffer is contained within its own cache line. This has no effect on cache coherent architectures. Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED

[PATCH 1/2] DMA buffer alignment annotations

2007-12-20 Thread Benjamin Herrenschmidt
be DMA'ed to. On non-coherent platforms, this causes various corruptions as this cache line is shared with various other fields of the scsi_cmnd data structure. Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED] --- Documentation/DMA-mapping.txt | 32 include/asm

Re: [PATCH 1/2] DMA buffer alignment annotations

2007-12-21 Thread Benjamin Herrenschmidt
On Fri, 2007-12-21 at 09:39 +, Russell King wrote: +#ifndef ARCH_MIN_DMA_ALIGNMENT +#define __dma_aligned +#define __dma_buffer +#else +#define __dma_aligned __attribute__((aligned(ARCH_MIN_DMA_ALIGNMENT))) +#define __dma_buffer

Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd

2007-12-21 Thread Benjamin Herrenschmidt
On Fri, 2007-12-21 at 10:33 +, Alan Cox wrote: On Fri, 21 Dec 2007 13:30:08 +1100 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: The sense buffer ins scsi_cmnd can nowadays be DMA'ed into directly by some low level drivers (that typically happens with USB mass storage). Should

Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd

2007-12-21 Thread Benjamin Herrenschmidt
On Fri, 2007-12-21 at 06:16 -0700, Matthew Wilcox wrote: On Fri, Dec 21, 2007 at 10:33:26AM +, Alan Cox wrote: On Fri, 21 Dec 2007 13:30:08 +1100 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: The sense buffer ins scsi_cmnd can nowadays be DMA'ed into directly by some low level

Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-05 Thread Benjamin Herrenschmidt
On Sat, 2013-10-05 at 16:20 +0200, Alexander Gordeev wrote: So my point is - drivers should first obtain a number of MSIs they *can* get, then *derive* a number of MSIs the device is fine with and only then request that number. Not terribly different from memory or any other type of resource

Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-06 Thread Benjamin Herrenschmidt
On Sun, 2013-10-06 at 08:02 +0200, Alexander Gordeev wrote: On Sun, Oct 06, 2013 at 08:46:26AM +1100, Benjamin Herrenschmidt wrote: On Sat, 2013-10-05 at 16:20 +0200, Alexander Gordeev wrote: So my point is - drivers should first obtain a number of MSIs they *can* get, then *derive

Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-07 Thread Benjamin Herrenschmidt
On Mon, 2013-10-07 at 14:01 -0400, Tejun Heo wrote: I don't think the same race condition would happen with the loop. The problem case is where multiple msi(x) allocation fails completely because the global limit went down before inquiry and allocation. In the loop based interface, it'd

Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-08 Thread Benjamin Herrenschmidt
On Tue, 2013-10-08 at 20:55 -0700, H. Peter Anvin wrote: Why not add a minimum number to pci_enable_msix(), i.e.: pci_enable_msix(pdev, msix_entries, nvec, minvec) ... which means nvec is the number of interrupts *requested*, and minvec is the minimum acceptable number (otherwise fail).

Re: [RESEND][PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig

2014-03-22 Thread Benjamin Herrenschmidt
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 1594945..8122294 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -82,6 +82,7 @@ config ARM http://www.arm.linux.org.uk/. config ARM_HAS_SG_CHAIN + select ARCH_HAS_SG_CHAIN bool Heh, a self-selecting

Re: [RESEND][PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig

2014-03-22 Thread Benjamin Herrenschmidt
-generic/scatterlist.h. Cc: Russell King li...@arm.linux.org.uk Cc: Tony Luck tony.l...@intel.com Cc: Fenghua Yu fenghua...@intel.com For powerpc Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Paul Mackerras pau...@samba.org Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter

Re: [RESEND][PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig

2014-03-23 Thread Benjamin Herrenschmidt
On Sun, 2014-03-23 at 00:03 -0700, Christoph Hellwig wrote: On Sun, Mar 23, 2014 at 02:04:46PM +1100, Benjamin Herrenschmidt wrote: diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 1594945..8122294 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -82,6 +82,7

Re: qla_wxyz pci_set_mwi question

2007-04-12 Thread Benjamin Herrenschmidt
On Thu, 2007-04-12 at 14:04 -0600, Matthew Wilcox wrote: On Thu, Apr 12, 2007 at 12:37:13PM -0700, Andrew Vasquez wrote: On Thu, 12 Apr 2007, Matthew Wilcox wrote: Why should it fail? If there's a platform which can't support a cacheline size that the qla2xyz card can handle, it should

Re: qla_wxyz pci_set_mwi question

2007-04-12 Thread Benjamin Herrenschmidt
Willy was referring to this from include/asm-powerpc/pci.h: #ifdef CONFIG_PPC64 /* * We want to avoid touching the cacheline size or MWI bit. * pSeries firmware sets the cacheline size (which is not the cpu cacheline * size in all cases) and hardware treats MWI the same as memory

Re: [patch 6/7] ps3: ROM Storage Driver

2007-05-29 Thread Benjamin Herrenschmidt
On Tue, 2007-05-29 at 13:11 +0200, Geert Uytterhoeven wrote: This looks very inefficient. Just set sg_tablesize of your driver to 1 to avoid getting mutiple segments. The disadvantage of setting sg_tablesize = 1 is that the driver will get small requests (PAGE_SIZE) most of the time,

Re: [patch 6/7] ps3: ROM Storage Driver

2007-05-30 Thread Benjamin Herrenschmidt
On Wed, 2007-05-30 at 12:13 +0200, Christoph Hellwig wrote: For any sane hypervisor or hardware the copy should be worth than that. Then again a sane hardware or hypervisor would support SG requests.. Agreed... Sony should fix that, it's a bit ridiculous. Ben. - To unsubscribe from this

Re: [patch 1/6] ps3: Preallocate bootmem memory for the PS3 FLASH ROM storage driver

2007-06-15 Thread Benjamin Herrenschmidt
On Fri, 2007-06-15 at 13:39 +0200, Geert Uytterhoeven wrote: plain text document attachment (ps3-stable) Preallocate 256 KiB of bootmem memory for the PS3 FLASH ROM storage driver. I still very much dislike the #ifdef xxx_MODULE in main kernel code. At the end of the day, is it realistic to

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-13 Thread Benjamin Herrenschmidt
On Fri, 2007-07-13 at 09:02 -0400, James Bottomley wrote: On Wed, 2007-07-04 at 15:22 +0200, Geert Uytterhoeven wrote: + kaddr = kmap_atomic(sgpnt-page, KM_USER0); + if (!kaddr) + return -1; +

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-13 Thread Benjamin Herrenschmidt
On Fri, 2007-07-13 at 16:19 +0200, Arnd Bergmann wrote: I'm pretty sure that no ppc64 machine needs alias resolution in the kernel, although some are VIPT. Last time we discussed this, Segher explained it to me, but I don't remember which way Cell does it. IIRC, it automatically flushes cache

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-16 Thread Benjamin Herrenschmidt
Upon closer look, while flush_kernel_dcache_page() is a no-op on ppc64, flush_dcache_page() isn't. So I'd prefer to not call it if not really needed. And according to James, flush_kernel_dcache_page() should be sufficient... So I'm getting puzzled again... flush_dcache_page() handles

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-16 Thread Benjamin Herrenschmidt
On Mon, 2007-07-16 at 17:03 -0500, James Bottomley wrote: On Tue, 2007-07-17 at 07:49 +1000, Benjamin Herrenschmidt wrote: No ... that was the point of flush_kernel_dcache_page(). The page in question is page cache backed and contains user mappings. However, the block layer has already

Re: [PATCH 2.6.15.4 rel.2 1/1] libata: add hotswap to sata_svw

2006-11-28 Thread Benjamin Herrenschmidt
On Tue, 2006-11-28 at 23:22 +, David Woodhouse wrote: On Thu, 2006-02-16 at 16:09 +0100, Martin Devera wrote: From: Martin Devera [EMAIL PROTECTED] Add hotswap capability to Serverworks/BroadCom SATA controlers. The controler has SIM register and it selects which bits in SATA_ERROR

Re: [PATCH 18/59] sysctl: ipmi remove unnecessary insert_at_head flag

2007-01-16 Thread Benjamin Herrenschmidt
On Tue, 2007-01-16 at 09:39 -0700, Eric W. Biederman wrote: From: Eric W. Biederman [EMAIL PROTECTED] - unquoted With unique sysctl binary numbers setting insert_at_head is pointless. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Acked-by: Benjamin Herrenschmidt [EMAIL PROTECTED

Re: [PATCH 36/59] sysctl: C99 convert ctl_tables entries in arch/ppc/kernel/ppc_htab.c

2007-01-16 Thread Benjamin Herrenschmidt
On Tue, 2007-01-16 at 09:39 -0700, Eric W. Biederman wrote: From: Eric W. Biederman [EMAIL PROTECTED] - unquoted And make the mode of the kernel directory 0555 no one is allowed to write to sysctl directories. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Acked-by: Benjamin

Re: [PATCH 35/59] sysctl: C99 convert ctl_tables in arch/powerpc/kernel/idle.c

2007-01-16 Thread Benjamin Herrenschmidt
On Tue, 2007-01-16 at 09:39 -0700, Eric W. Biederman wrote: From: Eric W. Biederman [EMAIL PROTECTED] - unquoted This was partially done already and there was no ABI breakage what a relief. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Acked-by: Benjamin Herrenschmidt [EMAIL

Re: [PATCH] scsi/ibmvscsi: /sys/class/scsi_host/hostX/config doesn't show any information

2012-07-26 Thread Benjamin Herrenschmidt
On Wed, 2012-07-18 at 18:49 +0200, o...@aepfle.de wrote: From: Linda Xie lx...@us.ibm.com James, can I assume you're picking up those two ? Cheers, Ben. Expected result: It should show something like this: x1521p4:~ # cat /sys/class/scsi_host/host1/config PARTITIONNAME='x1521p4'

Re: [PATCH] scsi/ibmvscsi: add module alias for ibmvscsic

2012-07-29 Thread Benjamin Herrenschmidt
code is gone the backend abstraction in this driver is no longer necessary, which allows us to consolidate the driver in one file. The side effect is that the module name is now ibmvscsi.ko which matches the driver hotplug name and fixes auto-load issues. Signed-off-by: Benjamin Herrenschmidt b

Re: [PATCH] scsi/ibmvscsi: /sys/class/scsi_host/hostX/config doesn't show any information

2012-07-29 Thread Benjamin Herrenschmidt
-by: Benjamin Herrenschmidt b...@kernel.crashing.org CC: sta...@vger.kernel.org --- diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 3a6c474..337e8b3 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -1541,6 +1541,9

Re: [PATCH] scsi/ibmvscsi: add module alias for ibmvscsic

2012-07-30 Thread Benjamin Herrenschmidt
On Mon, 2012-07-30 at 21:06 +0200, Olaf Hering wrote: So while this would work, I do wonder however whether we could instead fix it by simplifying the whole thing as follow since iSeries is now gone and so we don't need split backends anymore: scsi/ibmvscsi: Remove backend abstraction

Re: Concerns about mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support

2015-02-19 Thread Benjamin Herrenschmidt
On Fri, 2015-02-20 at 16:22 +1100, Benjamin Herrenschmidt wrote: Looking a bit more closely, you basically do - set_dma_mask(64-bit) - set_consistent_dma_mask(32-bit) Now, I don't know how x86 will react to the conflicting masks, but on ppc64, I'm pretty sure the second one will barf

Re: Concerns about mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support

2015-02-19 Thread Benjamin Herrenschmidt
On Thu, 2015-02-19 at 21:45 -0800, James Bottomley wrote: Ben, this is legal by design. It was specifically designed for the aic79xx SCSI card, but can be used for a variety of other reasons. The aic79xx hardware problem was that the DMA engine could address the whole of memory (it had two

Concerns about mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support

2015-02-19 Thread Benjamin Herrenschmidt
Hi Sreekanth ! While looking at some (unrelated) issue where mtp2sas seems to be using 32-bit DMA instead of 64-bit DMA on some POWER platforms, I noticed this patch which was merged as 5fb1bf8aaa832e1e9ca3198de7bbecb8eff7db9c. Can you confirm my understanding that you are: - Setting the DMA

Re: Concerns about mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support

2015-02-19 Thread Benjamin Herrenschmidt
On Fri, 2015-02-20 at 16:06 +1100, Benjamin Herrenschmidt wrote: Note that even on powerpc platforms where it would work because we maintain both 32-bit and 64-bit bypass windows in the device address space simultaneously, you will leak iommu entries unless you also switch back to 32-bit when

Re: Concerns about mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support

2015-04-01 Thread Benjamin Herrenschmidt
On Thu, 2015-02-19 at 21:45 -0800, James Bottomley wrote: Ben, this is legal by design. It was specifically designed for the aic79xx SCSI card, but can be used for a variety of other reasons. The aic79xx hardware problem was that the DMA engine could address the whole of memory (it had two

Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-10 Thread Benjamin Herrenschmidt
On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote: Add superpipe supporting infrastructure to device driver for the IBM CXL Flash adapter. This patch allows userspace applications to take advantage of the accelerated I/O features that this adapter provides and bypass the traditional

Re: [PATCH v4 1/3] cxlflash: Base error recovery support

2015-08-10 Thread Benjamin Herrenschmidt
On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote: Introduce support for enhanced I/O error handling. Signed-off-by: Matthew R. Ochs mro...@linux.vnet.ibm.com Signed-off-by: Manoj N. Kumar ma...@linux.vnet.ibm.com --- So I'm not necessarily very qualified to review SCSI bits as I

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-02-13 Thread Benjamin Herrenschmidt
On Tue, 2017-02-14 at 15:45 +1300, Eric W. Biederman wrote: > The only difference ever that should exist between shutdown and remove > is do you clean up kernel data structures.  The shutdown method is > allowed to skip the cleanup up kernel data structures that the remove > method needs to make.

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-02-13 Thread Benjamin Herrenschmidt
On Mon, 2017-02-13 at 15:57 -0600, Brian King wrote: > If we do transition to use remove rather than shutdown, I think we > want > some way for a device driver to know whether we are doing kexec or > not. > A RAID adapter with a write cache is going to want to flush its write > cache on a PCI

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-02-12 Thread Benjamin Herrenschmidt
On Mon, 2017-02-13 at 08:49 +1100, Anton Blanchard wrote: > From: Anton Blanchard > > We see lpfc devices regularly fail during kexec. Fix this by adding > a shutdown method which mirrors the remove method. Or instead finally do what I've been advocating for years (and even

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-02-12 Thread Benjamin Herrenschmidt
On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote: > > Good point, at the very least we should call remove if shutdown doesn't > > exist. Eric: could we make the changes Ben suggests? > > Definitely.  That was the original design of the kexec interface > but people were worried about

[PATCH] scsi/ipr: Fix runaway IRQs when falling back from MSI to LSI

2016-11-23 Thread Benjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> --- drivers/scsi/ipr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index 5324741..5dd3194 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -10213,6 +10213,7 @@ static i

Re: [PATCH] ibmvscsi: add write memory barrier to CRQ processing

2016-12-09 Thread Benjamin Herrenschmidt
On Wed, 2016-12-07 at 17:31 -0600, Tyrel Datwyler wrote: > The first byte of each CRQ entry is used to indicate whether an entry is > a valid response or free for the VIOS to use. After processing a > response the driver sets the valid byte to zero to indicate the entry is > now free to be reused.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-12 Thread Benjamin Herrenschmidt
On Thu, 2017-03-30 at 16:12 -0600, Logan Gunthorpe wrote: > Hello, > > As discussed at LSF/MM we'd like to present our work to enable > copy offload support in NVMe fabrics RDMA targets. We'd appreciate > some review and feedback from the community on our direction. > This series is not intended

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-12 Thread Benjamin Herrenschmidt
On Wed, 2017-04-12 at 11:09 -0600, Logan Gunthorpe wrote: > > > Do you handle funky address translation too ? IE. the fact that the PCI > > addresses aren't the same as the CPU physical addresses for a BAR ? > > No, we use the CPU physical address of the BAR. If it's not mapped that > way we

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Fri, 2017-04-14 at 14:04 -0500, Bjorn Helgaas wrote: > I'm a little hesitant about excluding offset support, so I'd like to > hear more about this. > > Is the issue related to PCI BARs that are not completely addressable > by the CPU?  If so, that sounds like a first-class issue that should >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 10:34 -0600, Logan Gunthorpe wrote: > > On 16/04/17 09:53 AM, Dan Williams wrote: > > ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve > > context about the physical address in question. I'm thinking you can > > hang bus address translation data off of

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 08:53 -0700, Dan Williams wrote: > > Just thinking out loud ... I don't have a firm idea or a design. But > > peer to peer is definitely a problem we need to tackle generically, the > > demand for it keeps coming up. > > ZONE_DEVICE allows you to redirect via

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 10:47 -0600, Logan Gunthorpe wrote: > > I think you need to give other archs a chance to support this with a > > design that considers the offset case as a first class citizen rather > > than an afterthought. > > I'll consider this. Given the fact I can use your existing >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 23:13 -0600, Logan Gunthorpe wrote: > > > > > I'm still not 100% why do you need a "p2mem device" mind you ... > > Well, you don't "need" it but it is a design choice that I think makes a > lot of sense for the following reasons: > > 1) p2pmem is in fact a device on the

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 08:44 -0700, Dan Williams wrote: > The difference is that there was nothing fundamental in the core > design of pmem + DAX that prevented other archs from growing pmem > support. Indeed. In fact we have work in progress support for pmem on power using experimental HW. > THP

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-13 Thread Benjamin Herrenschmidt
On Thu, 2017-04-13 at 15:22 -0600, Logan Gunthorpe wrote: > > On 12/04/17 03:55 PM, Benjamin Herrenschmidt wrote: > > Look at pcibios_resource_to_bus() and pcibios_bus_to_resource(). They > > will perform the conversion between the struct resource content (CPU > > physical

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Fri, 2017-04-14 at 21:37 +1000, Benjamin Herrenschmidt wrote: > On Thu, 2017-04-13 at 22:40 -0600, Logan Gunthorpe wrote: > > > > On 13/04/17 10:16 PM, Jason Gunthorpe wrote: > > > I'd suggest just detecting if there is any translation in bus > > > addresses

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Thu, 2017-04-13 at 22:16 -0600, Jason Gunthorpe wrote: > > Any caller of pci_add_resource_offset() uses CPU addresses different from > > the PCI bus addresses (unless the offset is zero, of course).  All ACPI > > platforms also support this translation (see "translation_offset"), though > > in

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Thu, 2017-04-13 at 22:40 -0600, Logan Gunthorpe wrote: > > On 13/04/17 10:16 PM, Jason Gunthorpe wrote: > > I'd suggest just detecting if there is any translation in bus > > addresses anywhere and just hard disabling P2P on such systems. > > That's a fantastic suggestion. It simplifies things

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Benjamin Herrenschmidt
On Sat, 2017-04-15 at 11:41 -0600, Logan Gunthorpe wrote: > Thanks, Benjamin, for the summary of some of the issues. > > On 14/04/17 04:07 PM, Benjamin Herrenschmidt wrote > > So I assume the p2p code provides a way to address that too via special > > dma_ops ? Or wrappers ?

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Benjamin Herrenschmidt
On Sat, 2017-04-15 at 15:09 -0700, Dan Williams wrote: > I'm wondering, since this is limited to support behind a single > switch, if you could have a software-iommu hanging off that switch > device object that knows how to catch and translate the non-zero > offset bus address case. We have

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Mon, 2017-04-17 at 23:43 -0600, Logan Gunthorpe wrote: > > On 17/04/17 03:11 PM, Benjamin Herrenschmidt wrote: > > Is it ? Again, you create a "concept" the user may have no idea about, > > "p2pmem memory". So now any kind of memory buffer on a device

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-03-06 Thread Benjamin Herrenschmidt
On Mon, 2017-03-06 at 22:46 -0500, Martin K. Petersen wrote: > > > > > > "Mauricio" == Mauricio Faria de Oliveira > > > > > et.ibm.com> writes: > > Mauricio> On 02/12/2017 07:49 PM, Anton Blanchard wrote: > > > We see lpfc devices regularly fail during kexec. Fix this by > >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Benjamin Herrenschmidt
On Mon, 2017-04-17 at 10:52 -0600, Logan Gunthorpe wrote: > > On 17/04/17 01:20 AM, Benjamin Herrenschmidt wrote: > > But is it ? For example take a GPU, does it, in your scheme, need an > > additional "p2pmem" child ? Why can't the GPU driver just use some > > h

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 12:00 -0600, Jason Gunthorpe wrote: > - All platforms can succeed if the PCI devices are under the same >   'segment', but where segments begin is somewhat platform specific >   knowledge. (this is 'same switch' idea Logan has talked about) We also need to be careful whether

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 17:21 -0600, Jason Gunthorpe wrote: > Splitting the sgl is different from iommu batching. > > As an example, an O_DIRECT write of 1 MB with a single 4K P2P page in > the middle. > > The optimum behavior is to allocate a 1MB-4K iommu range and fill it > with the CPU memory.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 16:24 -0600, Jason Gunthorpe wrote: > Basically, all this list processing is a huge overhead compared to > just putting a helper call in the existing sg iteration loop of the > actual op.  Particularly if the actual op is a no-op like no-mmu x86 > would use. Yes, I'm leaning

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 15:03 -0600, Jason Gunthorpe wrote: > I don't follow, when does get_dma_ops() return a p2p aware provider? > It has no way to know if the DMA is going to involve p2p, get_dma_ops > is called with the device initiating the DMA. > > So you'd always return the P2P shim on a

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 14:48 -0600, Logan Gunthorpe wrote: > > ...and that dma_map goes through get_dma_ops(), so I don't see the conflict? > > The main conflict is in dma_map_sg which only does get_dma_ops once but > the sg may contain memory of different types. We can handle that in our

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 15:22 -0600, Jason Gunthorpe wrote: > On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: > > > I think this opens an even bigger can of worms.. > > > > No, I don't think it does. You'd only shim when the target page is > > backed by a device, not host memory, and

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 10:27 -0700, Dan Williams wrote: > > FWIW, RDMA probably wouldn't want to use a p2mem device either, we > > already have APIs that map BAR memory to user space, and would like to > > keep using them. A 'enable P2P for bar' helper function sounds better > > to me. > > ...and