Re: [PATCH 1/1 v4] drivers/nvme: default to 4k device page size

2015-11-06 Thread Nishanth Aravamudan
On 05.11.2015 [11:58:39 -0800], Christoph Hellwig wrote: > Looks fine, > > Reviewed-by: Christoph Hellwig > > ... but I doubt we'll ever bother updating it. Most architectures > with arger page sizes also have iommus and would need different settings > for different iommus vs

[PATCH 1/1 v4] drivers/nvme: default to 4k device page size

2015-11-05 Thread Nishanth Aravamudan
On 03.11.2015 [13:46:25 +], Keith Busch wrote: > On Tue, Nov 03, 2015 at 05:18:24AM -0800, Christoph Hellwig wrote: > > On Fri, Oct 30, 2015 at 02:35:11PM -0700, Nishanth Aravamudan wrote: > > > diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c > &

Re: [PATCH 1/1 v4] drivers/nvme: default to 4k device page size

2015-11-05 Thread Nishanth Aravamudan
On 05.11.2015 [11:58:39 -0800], Christoph Hellwig wrote: > Looks fine, > > Reviewed-by: Christoph Hellwig > > ... but I doubt we'll ever bother updating it. Most architectures > with arger page sizes also have iommus and would need different settings > for different iommus vs

[PATCH 1/1 v3] drivers/nvme: default to 4k device page size

2015-10-30 Thread Nishanth Aravamudan
On 29.10.2015 [17:20:43 +], Busch, Keith wrote: > On Thu, Oct 29, 2015 at 08:57:01AM -0700, Nishanth Aravamudan wrote: > > On 29.10.2015 [04:55:36 -0700], Christoph Hellwig wrote: > > > We had a quick cht about this issue and I think we simply should > > > default to

Re: [PATCH 1/1 v3] drivers/nvme: default to 4k device page size

2015-10-30 Thread Nishanth Aravamudan
On 30.10.2015 [21:48:48 +], Keith Busch wrote: > On Fri, Oct 30, 2015 at 02:35:11PM -0700, Nishanth Aravamudan wrote: > > Given that it's 4K just about everywhere by default (and sort of > > implicitly expected to be, I guess), I think I'd prefer we default to > > 4K.

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-30 Thread Nishanth Aravamudan
On 29.10.2015 [18:49:55 -0700], David Miller wrote: > From: Nishanth Aravamudan <n...@linux.vnet.ibm.com> > Date: Thu, 29 Oct 2015 08:57:01 -0700 > > > So, would that imply changing just the NVMe driver code rather than > > adding the dma_page_shift API at all? W

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-29 Thread Nishanth Aravamudan
On 29.10.2015 [04:55:36 -0700], Christoph Hellwig wrote: > On Wed, Oct 28, 2015 at 01:59:23PM +, Busch, Keith wrote: > > The "new" interface for all the other architectures is the same as the > > old one we've been using for the last 5 years. > > > > I welcome x86 maintainer feedback to

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-27 Thread Nishanth Aravamudan
On 26.10.2015 [18:27:46 -0700], David Miller wrote: > From: Nishanth Aravamudan <n...@linux.vnet.ibm.com> > Date: Fri, 23 Oct 2015 13:54:20 -0700 > > > 1) add a generic dma_get_page_shift implementation that just returns > > PAGE_SHIFT > > I won't object t

Re: [PATCH 4/7 v2] pseries/iommu: implement DDW-aware dma_get_page_shift

2015-10-27 Thread Nishanth Aravamudan
On 27.10.2015 [16:56:10 +1100], Alexey Kardashevskiy wrote: > On 10/24/2015 07:59 AM, Nishanth Aravamudan wrote: > >When DDW (Dynamic DMA Windows) are present for a device, we have stored > >the TCE (Translation Control Entry) size in a special device tree > >property. Check i

Re: [PATCH 2/7 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-27 Thread Nishanth Aravamudan
On 27.10.2015 [17:02:16 +1100], Alexey Kardashevskiy wrote: > On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote: > >On Power, the kernel's page size can differ from the IOMMU's page size, > >so we need to override the generic implementation, which always returns > >the kerne

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-27 Thread Nishanth Aravamudan
On 28.10.2015 [09:57:48 +1100], Julian Calaby wrote: > Hi Nishanth, > > On Wed, Oct 28, 2015 at 9:20 AM, Nishanth Aravamudan > <n...@linux.vnet.ibm.com> wrote: > > On 26.10.2015 [18:27:46 -0700], David Miller wrote: > >> From: Nishanth Aravamudan <n...@linux.

Re: [PATCH 2/7 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-27 Thread Nishanth Aravamudan
On 28.10.2015 [12:00:20 +1100], Alexey Kardashevskiy wrote: > On 10/28/2015 09:27 AM, Nishanth Aravamudan wrote: > >On 27.10.2015 [17:02:16 +1100], Alexey Kardashevskiy wrote: > >>On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote: > >>>On Power, the kernel's page si

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-27 Thread Nishanth Aravamudan
On 27.10.2015 [17:53:22 -0700], David Miller wrote: > From: Nishanth Aravamudan <n...@linux.vnet.ibm.com> > Date: Tue, 27 Oct 2015 15:20:10 -0700 > > > Well, looks like I should spin up a v4 anyways for the powerpc changes. > > So, to make sure I understand your point

Re: [PATCH 2/7 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-27 Thread Nishanth Aravamudan
On 28.10.2015 [11:20:05 +0900], Benjamin Herrenschmidt wrote: > On Tue, 2015-10-27 at 18:54 -0700, Nishanth Aravamudan wrote: > > > > In "bypass" mode, what TCE size is used? Is it guaranteed to be 4K? > > None :-) The TCEs are completely bypassed. You get a N:M

[PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-23 Thread Nishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-23 Thread Nishanth Aravamudan
[Sorry, subject should have been 0/7!] On 23.10.2015 [13:54:20 -0700], Nishanth Aravamudan wrote: > We received a bug report recently when DDW (64-bit direct DMA on Power) > is not enabled for NVMe devices. In that case, we fall back to 32-bit > DMA via the IOMMU, which is always done vi

[PATCH 7/7 v2] drivers/nvme: default to the IOMMU page size

2015-10-23 Thread Nishanth Aravamudan
page size, rather than the kernel's page size. With this patch, a NVMe device survives our internal hardware exerciser; the kernel BUGs within a few seconds without the patch. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> --- v1 -> v2: Based upon feedback from Christop

[PATCH 2/7 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-23 Thread Nishanth Aravamudan
-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> --- arch/powerpc/include/asm/dma-mapping.h | 3 +++ arch/powerpc/kernel/dma.c | 9 + 2 files changed, 12 insertions(+) diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index 7

Re: [PATCH 5/7] [RFC PATCH 5/7] sparc: rename kernel/iommu_common.h -> include/asm/iommu_common.h

2015-10-23 Thread Nishanth Aravamudan
[Apologies for the subject line, should just have the [RFC PATCH 5/7]] On 23.10.2015 [14:00:08 -0700], Nishanth Aravamudan wrote: > In order to cleanly expose the desired IOMMU page shift via the new > dma_get_page_shift API, we need to have the sparc constants available in > a mor

[PATCH 3/7 v2] powerpc/dma: implement per-platform dma_get_page_shift

2015-10-23 Thread Nishanth Aravamudan
. DDW is a pseries-specific feature, so allow platforms to override the implementation of dma_get_page_shift if desired. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> --- arch/powerpc/include/asm/machdep.h | 3 ++- arch/powerpc/kernel/dma.c | 2 ++ 2 files changed, 4 inse

[PATCH 1/7 v3] dma-mapping: add generic dma_get_page_shift API

2015-10-23 Thread Nishanth Aravamudan
Drivers like NVMe need to be able to determine the page size used for DMA transfers. Add a new API that defaults to return PAGE_SHIFT on all architectures. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> --- v1 -> v2: Based upon feedback from Christoph Hellwig, implement

[PATCH 4/7 v2] pseries/iommu: implement DDW-aware dma_get_page_shift

2015-10-23 Thread Nishanth Aravamudan
the value up in struct iommu_table. If we don't find a iommu_table, fallback to the kernel's page size. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> --- arch/powerpc/platforms/pseries/iommu.c | 36 ++ 1 file changed, 36 insertions(+) diff

[RFC PATCH 6/7] sparc/dma-mapping: override dma_get_page_shift

2015-10-23 Thread Nishanth Aravamudan
On sparc, the kernel's page size differs from the IOMMU's page size, so override the generic implementation, which always returns the kernel's page size, and return IOMMU_PAGE_SHIFT instead. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> --- I know very little about spa

[PATCH 5/7] [RFC PATCH 5/7] sparc: rename kernel/iommu_common.h -> include/asm/iommu_common.h

2015-10-23 Thread Nishanth Aravamudan
In order to cleanly expose the desired IOMMU page shift via the new dma_get_page_shift API, we need to have the sparc constants available in a more typical location. There should be no functional impact to this move, but it is untested. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-19 Thread Nishanth Aravamudan
On 15.10.2015 [15:52:19 -0700], Nishanth Aravamudan wrote: > On 14.10.2015 [08:42:51 -0700], Christoph Hellwig wrote: > > Hi Nishanth, > > > > sorry for the late reply. > > > > > > On Power, since it's technically variable, we'd need a function. S

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-15 Thread Nishanth Aravamudan
On 14.10.2015 [08:42:51 -0700], Christoph Hellwig wrote: > Hi Nishanth, > > sorry for the late reply. > > > > On Power, since it's technically variable, we'd need a function. So are > > > you suggesting define'ing it to a function just on Power and leaving it > > > a constant elsewhere? > > > >

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-14 Thread Nishanth Aravamudan
Hi Christoph, On 12.10.2015 [14:06:51 -0700], Nishanth Aravamudan wrote: > On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote: > > Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define > > with an #ifndef in common code? > > On Power, since it's techn

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-12 Thread Nishanth Aravamudan
On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote: > Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define > with an #ifndef in common code? I suppose we could do that -- I wasn't sure if the macro would be palatable. > Also not all architectures use dma-mapping-common.h

Re: [PATCH 1/2] powerpc/iommu: expose IOMMU page shift

2015-10-12 Thread Nishanth Aravamudan
On 06.10.2015 [14:19:43 +1100], David Gibson wrote: > On Fri, Oct 02, 2015 at 10:18:00AM -0700, Nishanth Aravamudan wrote: > > We will leverage this macro in the NVMe driver, which needs to know the > > configured IOMMU page shift to properly configure its device's page > >

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-12 Thread Nishanth Aravamudan
On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote: > Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define > with an #ifndef in common code? On Power, since it's technically variable, we'd need a function. So are you suggesting define'ing it to a function just on Power

Re: [PATCH 1/2] powerpc/iommu: expose IOMMU page shift

2015-10-12 Thread Nishanth Aravamudan
On 12.10.2015 [09:03:52 -0700], Nishanth Aravamudan wrote: > On 06.10.2015 [14:19:43 +1100], David Gibson wrote: > > On Fri, Oct 02, 2015 at 10:18:00AM -0700, Nishanth Aravamudan wrote: > > > We will leverage this macro in the NVMe driver, which needs to know the > > >

Re: [PATCH 2/2] drivers/nvme: default to the IOMMU page size on Power

2015-10-02 Thread Nishanth Aravamudan
On 02.10.2015 [10:25:44 -0700], Christoph Hellwig wrote: > Hi Nishanth, > > please expose this value through the generic DMA API instead of adding > architecture specific hacks to drivers. Ok, I'm happy to do that instead -- what I struggled with is that I don't have enough knowledge of the

[PATCH 0/2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the

[PATCH 2/2] drivers/nvme: default to the IOMMU page size on Power

2015-10-02 Thread Nishanth Aravamudan
exerciser; the kernel BUGs within a few seconds without the patch. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c index 7920c27..969a95e 100644 --- a/drivers/block/nvme-core.c +++ b/drivers/block/nvme-core.c @@

[PATCH 1/2] powerpc/iommu: expose IOMMU page shift

2015-10-02 Thread Nishanth Aravamudan
We will leverage this macro in the NVMe driver, which needs to know the configured IOMMU page shift to properly configure its device's page size. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> --- Given this is available, it seems reasonable to expose -- and it doesn't reall

[PATCH 3/5 v2] powerpc/dma: implement per-platform dma_get_page_shift

2015-10-02 Thread Nishanth Aravamudan
. DDW is a pseries-specific feature, so allow platforms to override the implementation of dma_get_page_shift if desired. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index cab6753..5c372e3

[PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the

[PATCH 4/5 v2] pseries/iommu: implement DDW-aware dma_get_page_shift

2015-10-02 Thread Nishanth Aravamudan
the value up in struct iommu_table. If we don't find a iommu_table, fallback to the kernel's page size. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 0946b98..1bf6471

[PATCH 2/5 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-02 Thread Nishanth Aravamudan
-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index 7f522c0..c5638f4 100644 --- a/arch/powerpc/include/asm/dma-mapping.h +++ b/arch/powerpc/include/asm/dma-mapping.h @@ -125,6 +125,9 @@ static inlin

[PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-02 Thread Nishanth Aravamudan
Drivers like NVMe need to be able to determine the page size used for DMA transfers. Add a new API that defaults to return PAGE_SHIFT on all architectures. Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-g

[PATCH 5/5 v2] drivers/nvme: default to the IOMMU page size

2015-10-02 Thread Nishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the

Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
On 03.10.2015 [07:35:09 +1000], Benjamin Herrenschmidt wrote: > On Fri, 2015-10-02 at 14:04 -0700, Nishanth Aravamudan wrote: > > Right, I did start with your advice and tried that approach, but it > > turned out I was wrong about the actual issue at the time. The problem >

Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
On 03.10.2015 [06:51:06 +1000], Benjamin Herrenschmidt wrote: > On Fri, 2015-10-02 at 13:09 -0700, Nishanth Aravamudan wrote: > > > 1) add a generic dma_get_page_shift implementation that just returns > > PAGE_SHIFT > > So you chose to return the granularity of the iomm

Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

2015-09-28 Thread Nishanth Aravamudan
On 28.09.2015 [13:44:42 +0300], Denis Kirjanov wrote: > On 9/27/15, Raghavendra K T wrote: > > Problem description: > > Powerpc has sparse node numbering, i.e. on a 4 node system nodes are > > numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid

Re: [PATCH RFC 2/5] powerpc:numa Rename functions referring to nid as chipid

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:10 +0530], Raghavendra K T wrote: > There is no change in the fuctionality > > Signed-off-by: Raghavendra K T > --- > arch/powerpc/mm/numa.c | 42 +- > 1 file changed, 21 insertions(+), 21

Re: [PATCH RFC 4/5] powerpc:numa Add helper functions to maintain chipid to nid mapping

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:12 +0530], Raghavendra K T wrote: > Create arrays that maps serial nids and sparse chipids. > > Note: My original idea had only two arrays of chipid to nid map. Final > code is inspired by driver/acpi/numa.c that maps a proximity node with > a logical node by Takayoshi Kochi

Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:08 +0530], Raghavendra K T wrote: > Problem description: > Powerpc has sparse node numbering, i.e. on a 4 node system nodes are > numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid > got from device tree is naturally mapped (directly) to nid. chipid is a

Re: [PATCH RFC 3/5] powerpc:numa create 1:1 mappaing between chipid and nid

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:11 +0530], Raghavendra K T wrote: > Once we have made the distinction between nid and chipid > create a 1:1 mapping between them. This makes compacting the > nids easy later. > > No functionality change. > > Signed-off-by: Raghavendra K T

Re: [PATCH RFC 3/5] powerpc:numa create 1:1 mappaing between chipid and nid

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:11 +0530], Raghavendra K T wrote: > Once we have made the distinction between nid and chipid > create a 1:1 mapping between them. This makes compacting the > nids easy later. Didn't the previous patch just do the opposite of... > @@ -286,7 +308,7 @@ int

Re: [RFC] powerpc/hugetlb: Add warning message when gpage allocation request fails

2015-09-14 Thread Nishanth Aravamudan
On 14.09.2015 [18:59:25 +0530], Aneesh Kumar K.V wrote: > Anshuman Khandual writes: > > > When a 16GB huge page is requested on POWER platform through kernel command > > line interface, it silently fails because of the lack of any gigantic pages > > on the system

Re: [PATCH v2] powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel=

2015-09-07 Thread Nishanth Aravamudan
On 07.09.2015 [19:19:09 +1000], Michael Ellerman wrote: > On Fri, 2015-09-04 at 11:22 -0700, Nishanth Aravamudan wrote: > > The 32-bit TCE table initialization relies on the DMA window having a > > size equal to a power of 2 (and checks for it explicitly). But > > crashkern

[PATCH] powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel=

2015-09-04 Thread Nishanth Aravamudan
controller) are successfully initialized. After this change, the PCI devices successfully set up the 32-bit TCE table and kdump succeeds. Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages") Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm

[PATCH v2] powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel=

2015-09-04 Thread Nishanth Aravamudan
controller) are successfully initialized. After this change, the PCI devices successfully set up the 32-bit TCE table and kdump succeeds. Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages") Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm

Re: powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel=

2015-09-04 Thread Nishanth Aravamudan
On 04.09.2015 [20:01:22 +0200], Jan Stancek wrote: > On Fri, Sep 04, 2015 at 09:59:38AM -0700, Nishanth Aravamudan wrote: > > The 32-bit TCE table initialization relies on the DMA window having a > > size equal to a power of 2 (and checks for it explicitly). But > > crashkern

Re: [PATCH v2] powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel

2015-09-03 Thread Nishanth Aravamudan
On 03.09.2015 [19:58:53 +1000], Michael Ellerman wrote: > On Wed, 2015-09-02 at 08:39 -0700, Nishanth Aravamudan wrote: > > On 02.09.2015 [19:00:31 +1000], Alexey Kardashevskiy wrote: > > > On 09/02/2015 11:11 AM, Nishanth Aravamudan wrote: > > > >diff --git a/arch

[PATCH v2] powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel

2015-09-02 Thread Nishanth Aravamudan
On 02.09.2015 [19:00:31 +1000], Alexey Kardashevskiy wrote: > On 09/02/2015 11:11 AM, Nishanth Aravamudan wrote: > >When attempting to kdump with the 4.2 kernel, we see for each PCI > >device: > > > > pci 0003:01 : [PE# 000] Assign DMA32 space > > pci 0003

[PATCH] powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel

2015-09-01 Thread Nishanth Aravamudan
ump succeeds. The problem was seen on a Firestone machine originally. Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages") Signed-off-by: Nishanth Aravamudan <n...@linux.vnet.ibm.com> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arc

Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and Couldn't this also be

Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [11:30:58 -0500], Chris J Arges wrote: On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote: On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory

Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread Nishanth Aravamudan
: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com Acked-by: Nishanth Aravamudan n...@linux.vnet.ibm.com --- net/openvswitch/flow_table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/openvswitch/flow_table.c b/net/openvswitch

Re: [RFC PATCH 1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-15 Thread Nishanth Aravamudan
On 15.07.2015 [16:35:16 -0400], Tejun Heo wrote: Hello, On Thu, Jul 02, 2015 at 04:02:02PM -0700, Nishanth Aravamudan wrote: we currently emit at boot: [0.00] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 After this commit, we correctly emit: [0.00] pcpu-alloc: [0] 0 1

Re: [PATCH 5/6] [RFC] crypto/testmgr: add null test for 842 algorithm

2015-07-13 Thread Nishanth Aravamudan
On 13.07.2015 [17:05:36 -0700], Nishanth Aravamudan wrote: On 04.07.2015 [15:24:53 +0800], Herbert Xu wrote: On Thu, Jul 02, 2015 at 03:41:19PM -0700, Nishanth Aravamudan wrote: Currently, when the nx-842-pseries driver loads, the following message is emitted: alg: No test for 842

Re: [PATCH 5/6] [RFC] crypto/testmgr: add null test for 842 algorithm

2015-07-13 Thread Nishanth Aravamudan
On 04.07.2015 [15:24:53 +0800], Herbert Xu wrote: On Thu, Jul 02, 2015 at 03:41:19PM -0700, Nishanth Aravamudan wrote: Currently, when the nx-842-pseries driver loads, the following message is emitted: alg: No test for 842 (842-nx) It seems like the simplest way to fix this message

Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-10 Thread Nishanth Aravamudan
On 08.07.2015 [16:16:23 -0700], Nishanth Aravamudan wrote: On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote: On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote: Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we have an ordering issue during boot

Re: [RFC PATCH 1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-10 Thread Nishanth Aravamudan
On 08.07.2015 [18:22:09 -0700], David Rientjes wrote: On Thu, 2 Jul 2015, Nishanth Aravamudan wrote: Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we have an ordering issue during boot with early calls to cpu_to_node(). The value returned by those calls now depend

Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-08 Thread Nishanth Aravamudan
On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote: On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote: Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we have an ordering issue during boot with early calls to cpu_to_node(). now that .. implies we

Re: [PATCH 6/6] nx-842-platform: if NX842 platform drivers are not modules, don't try to load them

2015-07-06 Thread Nishanth Aravamudan
On 06.07.2015 [16:13:07 +0800], Herbert Xu wrote: On Thu, Jul 02, 2015 at 03:42:26PM -0700, Nishanth Aravamudan wrote: Based off the CONFIG_SPU_FS_MODULE code, only attempt to load platform modules if the nx-842 pseries/powernv drivers are built as modules. Otherwise

[PATCH v2] crypto/nx-842-{powerpc,pseries}: reduce chattiness of platform drivers

2015-07-06 Thread Nishanth Aravamudan
On 03.07.2015 [11:30:32 +1000], Michael Ellerman wrote: On Thu, 2015-07-02 at 15:40 -0700, Nishanth Aravamudan wrote: While we never would successfully load on the wrong machine type, there is extra output by default regardless of machine type. For instance, on a PowerVM LPAR, we see

[PATCH 5/6] [RFC] crypto/testmgr: add null test for 842 algorithm

2015-07-02 Thread Nishanth Aravamudan
Currently, when the nx-842-pseries driver loads, the following message is emitted: alg: No test for 842 (842-nx) It seems like the simplest way to fix this message (other than adding a proper test) is to just insert the null test into the list in the testmgr. Signed-off-by: Nishanth Aravamudan

[PATCH 6/6] nx-842-platform: if NX842 platform drivers are not modules, don't try to load them

2015-07-02 Thread Nishanth Aravamudan
platform driver. Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com Cc: Dan Streetman ddstr...@us.ibm.com Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Cc: linux-cry...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org --- drivers/crypto/nx/nx-842

[RFC PATCH 1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-02 Thread Nishanth Aravamudan
] 0 1 2 3 [1] 4 5 6 7 Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 5f1048e..f2c4c89 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -39,6 +39,8

[RFC PATCH 2/2] powerpc/smp: use early_cpu_to_node() instead of direct references to numa_cpu_lookup_table

2015-07-02 Thread Nishanth Aravamudan
A simple move to a wrapper function to numa_cpu_lookup_table, now that power has the early_cpu_to_node() API. Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index ec9ec20..7bf333b 100644 --- a/arch/powerpc/kernel

[PATCH 0/6] drivers/nx-842: reduce verbosity of logging

2015-07-02 Thread Nishanth Aravamudan
Currently, on a LPAR with the nx-842 device disabled, the following messages are emitted: nx_compress: no nx842 driver found. [1] Registering IBM Power 842 compression driver nx_compress_pseries ibm,compression-v1: nx842_OF_upd_status: status 'disabled' is not 'okay' nx_compress_pseries

[PATCH 2/6] nx-842-pseries: rename nx842_{init,exit} to nx842_pseries_{init,exit}

2015-07-02 Thread Nishanth Aravamudan
While there is no technical reason that both nx-842.c and nx-842-pseries.c can have the same name for the init/exit functions, it is a bit confusing with initcall_debug. Rename the pseries specific functions appropriately Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com --- drivers

[PATCH 1/6] crypto/nx-842-pseries: nx842_OF_upd_status should return ENODEV if device is not 'okay'

2015-07-02 Thread Nishanth Aravamudan
an extra error in that case. It seems like the proper return code of a disabled device is ENODEV. Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com --- drivers/crypto/nx/nx-842-pseries.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/nx/nx-842-pseries.c b

[PATCH 3/6] nx-842-pseries: do not emit extra output if status is disabled

2015-07-02 Thread Nishanth Aravamudan
, and we are going to emit that the device is disabled, only print out a non-'okay' status if it is not 'disabled'. Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com --- drivers/crypto/nx/nx-842-pseries.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers

[PATCH 4/6] crypto/nx-842-{powerpc,pseries}: only load on the appropriate machine type

2015-07-02 Thread Nishanth Aravamudan
never be found. Similar pseries messages are printed on powernv. Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com --- drivers/crypto/nx/nx-842-powernv.c | 6 ++ drivers/crypto/nx/nx-842-pseries.c | 6 ++ drivers/crypto/nx/nx-842.h | 1 + 3 files changed, 13 insertions

Re: powerpc,numa: Memory hotplug to memory-less nodes ?

2015-06-25 Thread Nishanth Aravamudan
On 24.06.2015 [07:13:36 -0500], Nathan Fontenot wrote: On 06/23/2015 11:01 PM, Bharata B Rao wrote: So will it be correct to say that memory hotplug to memory-less node isn't supported by PowerPC kernel ? Should I enforce the same in QEMU for PowerKVM ? I'm not sure if that is correct.

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-05-08 Thread Nishanth Aravamudan
On 08.05.2015 [15:47:26 -0700], Andrew Morton wrote: On Wed, 06 May 2015 11:28:12 +0200 Vlastimil Babka vba...@suse.cz wrote: On 05/06/2015 12:09 AM, Nishanth Aravamudan wrote: On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote: What I find somewhat worrying though is that we

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-05-05 Thread Nishanth Aravamudan
On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote: On 03.04.2015 [09:57:35 +0200], Vlastimil Babka wrote: On 03/31/2015 11:48 AM, Michal Hocko wrote: On Fri 27-03-15 15:23:50, Nishanth Aravamudan wrote: On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote: On 03/27/2015 12:28 PM

Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-10 Thread Nishanth Aravamudan
On 10.04.2015 [14:37:19 +0300], Konstantin Khlebnikov wrote: On 10.04.2015 01:58, Tanisha Aravamudan wrote: On 09.04.2015 [07:27:28 +0300], Konstantin Khlebnikov wrote: On Thu, Apr 9, 2015 at 2:07 AM, Nishanth Aravamudan n...@linux.vnet.ibm.com wrote: On 08.04.2015 [20:04:04 +0300

Re: Topology updates and NUMA-level sched domains

2015-04-10 Thread Nishanth Aravamudan
On 10.04.2015 [11:08:10 +0200], Peter Zijlstra wrote: On Fri, Apr 10, 2015 at 10:31:53AM +0200, Peter Zijlstra wrote: Please, step back, look at what you're doing and ask yourself, will any sane person want to use this? Can they use this? If so, start by describing the desired user

Re: Topology updates and NUMA-level sched domains

2015-04-10 Thread Nishanth Aravamudan
On 10.04.2015 [10:31:53 +0200], Peter Zijlstra wrote: On Thu, Apr 09, 2015 at 03:29:56PM -0700, Nishanth Aravamudan wrote: No, that's very much not the same. Even if it were dealing with hotplug it would still assume the cpu to return to the same node. The analogy may have been poor

Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-08 Thread Nishanth Aravamudan
On 08.04.2015 [20:04:04 +0300], Konstantin Khlebnikov wrote: On 08.04.2015 19:59, Konstantin Khlebnikov wrote: Node 0 might be offline as well as any other numa node, in this case kernel cannot handle memory allocation and crashes. Isn't the bug that numa_node_id() returned an offline node?

Re: Topology updates and NUMA-level sched domains

2015-04-07 Thread Nishanth Aravamudan
On 07.04.2015 [12:21:47 +0200], Peter Zijlstra wrote: On Mon, Apr 06, 2015 at 02:45:58PM -0700, Nishanth Aravamudan wrote: Hi Peter, As you are very aware, I think, power has some odd NUMA topologies (and changes to the those topologies) at run-time. In particular, we can see

Topology updates and NUMA-level sched domains

2015-04-06 Thread Nishanth Aravamudan
Hi Peter, As you are very aware, I think, power has some odd NUMA topologies (and changes to the those topologies) at run-time. In particular, we can see a topology at boot: Node 0: all Cpus Node 7: no cpus Then we get a notification from the hypervisor that a core (or two) have moved from node

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-04-03 Thread Nishanth Aravamudan
On 03.04.2015 [20:24:45 +0200], Michal Hocko wrote: On Fri 03-04-15 10:43:57, Nishanth Aravamudan wrote: On 31.03.2015 [11:48:29 +0200], Michal Hocko wrote: [...] I would expect kswapd would be looping endlessly because the zone wouldn't be balanced obviously. But I would be wrong

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-04-03 Thread Nishanth Aravamudan
On 31.03.2015 [11:48:29 +0200], Michal Hocko wrote: On Fri 27-03-15 15:23:50, Nishanth Aravamudan wrote: On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote: On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote: @@ -2585,7 +2585,7 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-04-03 Thread Nishanth Aravamudan
On 03.04.2015 [09:57:35 +0200], Vlastimil Babka wrote: On 03/31/2015 11:48 AM, Michal Hocko wrote: On Fri 27-03-15 15:23:50, Nishanth Aravamudan wrote: On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote: On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote: @@ -2585,7 +2585,7 @@ static bool

[PATCH] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable zones

2015-03-27 Thread Nishanth Aravamudan
-mentioned 16M hugepage allocation succeeds and correctly round-robins between Nodes 1 and 3. Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com diff --git a/mm/vmscan.c b/mm/vmscan.c index dcd90c8..033c2b7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2585,7 +2585,7 @@ static bool

Re: [PATCH] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable zones

2015-03-27 Thread Nishanth Aravamudan
[ Sorry, typo'd anton's address ] On 27.03.2015 [12:28:50 -0700], Nishanth Aravamudan wrote: Based upon 675becce15 (mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL) from Mel. We have a system with the following topology: (0) root @ br30p03: /root

[PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-03-27 Thread Nishanth Aravamudan
On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote: On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote: @@ -2585,7 +2585,7 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat) for (i = 0; i = ZONE_NORMAL; i++) { zone = pgdat-node_zones[i

Re: new decimal conversion - seeking testers

2015-03-12 Thread Nishanth Aravamudan
On 13.03.2015 [00:09:19 +0100], Rasmus Villemoes wrote: Hi, I've proposed a new implementation of decimal conversion for lib/vsprintf.c; see http://thread.gmane.org/gmane.linux.kernel/1892035/focus=1905478. Benchmarking so far shows 25-50% (depending on distribution of input numbers)

[PATCH v3] powerpc/numa: set node_possible_map to only node_online_map during boot

2015-03-10 Thread Nishanth Aravamudan
On 10.03.2015 [10:55:05 +1100], Michael Ellerman wrote: On Thu, 2015-03-05 at 21:27 -0800, Nishanth Aravamudan wrote: diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 0257a7d659ef..0c1716cd271f 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -958,6

Re: [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map

2015-03-05 Thread Nishanth Aravamudan
Hi David, On 05.03.2015 [13:16:35 -0800], David Rientjes wrote: On Thu, 5 Mar 2015, Nishanth Aravamudan wrote: diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 0257a7d659ef..24de29b3651b 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -958,9

Re: [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map

2015-03-05 Thread Nishanth Aravamudan
On 05.03.2015 [13:58:27 -0800], David Rientjes wrote: On Fri, 6 Mar 2015, Michael Ellerman wrote: diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 0257a7d659ef..24de29b3651b 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -958,9 +958,17

Re: [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map

2015-03-05 Thread Nishanth Aravamudan
On 05.03.2015 [17:13:08 -0500], Tejun Heo wrote: On Thu, Mar 05, 2015 at 10:05:49AM -0800, Nishanth Aravamudan wrote: While looking at this, I noticed that nr_node_ids is actually a misnomer, it seems. It's not the number, but the maximum_node_id, as with sparse NUMA nodes, you might only

Re: [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map

2015-03-05 Thread Nishanth Aravamudan
On 06.03.2015 [08:48:52 +1100], Michael Ellerman wrote: On Thu, 2015-03-05 at 13:16 -0800, David Rientjes wrote: On Thu, 5 Mar 2015, Nishanth Aravamudan wrote: diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 0257a7d659ef..24de29b3651b 100644 --- a/arch/powerpc/mm

Re: [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map

2015-03-05 Thread Nishanth Aravamudan
On 05.03.2015 [17:08:04 -0500], Tejun Heo wrote: Hello, On Thu, Mar 05, 2015 at 01:58:27PM -0800, David Rientjes wrote: I'm not sure why this is being proposed as a powerpc patch and now a patch for mem_cgroup_css_alloc(). In other words, why do we have to allocate for all possible

[RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map

2015-03-05 Thread Nishanth Aravamudan
, node_online_map), but I think the cost of anding the two will always be higher than zero and set a few bits in practice. Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com --- While looking at this, I noticed that nr_node_ids is actually a misnomer, it seems. It's not the number

  1   2   3   4   >