On Tue, Sep 3, 2019 at 10:18 PM Aneesh Kumar K.V
<[email protected]> wrote:
>
> On 9/4/19 5:48 AM, Dan Williams wrote:
> > Hi Aneesh,
> >
> > Patches 1 through 6 look ok, but I feel compelled to point out that
> > the patches deploy a different indentation style than the surrounding
> > code. I rely on the vim "=" operator to indent 2 tabs when for
> > multi-line if statements, i.e. this:
> >
> >          if (!nd_supported_alignment(align) &&
> >                          !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
> >                  dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
> >                                  "%ld:%ld\n", nd_pfn->align, align);
> >                  return -EOPNOTSUPP;
> >          }
> >
> > ...instead of this:
> >
> >          if (!nd_supported_alignment(align) &&
> >              !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
> >                  dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
> >                          "%ld:%ld\n", nd_pfn->align, align);
> >                  return -EOPNOTSUPP;
> >          }
> >
> > Could you reflow / resend for v7 after addressing the below comments?
> > To be clear I don't mind the lined up style, but mixed style within
> > the same file is distracting.
> >
>
>
> I sent a v7 and then realized there are more feedback below. Let me
> reply here and then possibly send only this patch updated if needed

Ok.

> > On Mon, Aug 19, 2019 at 6:35 AM Aneesh Kumar K.V
> > <[email protected]> wrote:
> >>
> >> Allow arch to provide the supported alignments and use hugepage alignment 
> >> only
> >> if we support hugepage. Right now we depend on compile time configs 
> >> whereas this
> >> patch switch this to runtime discovery.
> >>
> >> Architectures like ppc64 can have THP enabled in code, but then can have
> >> hugepage size disabled by the hypervisor. This allows us to create dax 
> >> devices
> >> with PAGE_SIZE alignment in this case.
> >>
> >> Existing dax namespace with alignment larger than PAGE_SIZE will fail to
> >> initialize in this specific case. We still allow fsdax namespace 
> >> initialization.
> >>
> >> With respect to identifying whether to enable hugepage fault for a dax 
> >> device,
> >> if THP is enabled during compile, we default to taking hugepage fault and 
> >> in dax
> >> fault handler if we find the fault size > alignment we retry with PAGE_SIZE
> >> fault size.
> >>
> >> This also addresses the below failure scenario on ppc64
> >>
> >> ndctl create-namespace --mode=devdax  | grep align
> >>   "align":16777216,
> >>   "align":16777216
> >>
> >> cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments
> >>   65536 16777216
> >>
> >> daxio.static-debug  -z -o /dev/dax0.0
> >>    Bus error (core dumped)
> >>
> >>    $ dmesg | tail
> >>     lpar: Failed hash pte insert with error -4
> >>     hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 
> >> access=0x8000000000000006 current=daxio
> >>     hash-mmu:     trap=0x300 vsid=0x22cb7a3 ssize=1 base psize=2 psize 10 
> >> pte=0xc000000501002b86
> >>     daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 
> >> 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000]
> >>     daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 
> >> 4800012c e93f0088 f93f0120
> >>     daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 <f9490000> 
> >> e93f0088 39290008 f93f0110
> >>
> >> The failure was due to guest kernel using wrong page size.
> >>
> >> The namespaces created with 16M alignment will appear as below on a config 
> >> with
> >> 16M page size disabled.
> >>
> >> $ ndctl list -Ni
> >> [
> >>    {
> >>      "dev":"namespace0.1",
> >>      "mode":"fsdax",
> >>      "map":"dev",
> >>      "size":5351931904,
> >>      "uuid":"fc6e9667-461a-4718-82b4-69b24570bddb",
> >>      "align":16777216,
> >>      "blockdev":"pmem0.1",
> >>      "supported_alignments":[
> >>        65536
> >>      ]
> >>    },
> >>    {
> >>      "dev":"namespace0.0",
> >>      "mode":"fsdax",    <==== devdax 16M alignment marked disabled.
> >
> > Why is fsdax disabled? It's fine for the fsdax case to fall back to
> > smaller faults. Or, this running into the case where the stored page
> > size is larger than the currently running PAGE_SIZE?
> >
> >>      "map":"mem",
> >>      "size":5368709120,
> >>      "uuid":"a4bdf81a-f2ee-4bc6-91db-7b87eddd0484",
> >>      "state":"disabled"
> >>    }
> >> ]
> >>
>
>
> That is how ndctl print the output. It is actually the devdax namespace
> which is marked disabled. That is why I added this as a comment next to
> that "mode": line

Oh, thanks for the clarification. That is indeed confusing. I suspect
this is due to the fact that of_pmem sets ND_REGION_PAGEMAP, so the
default state is "fsdax" instead of "raw" like other namespace types.

I wonder if we should hide "mode" by default when the namespace is
disabled? Otherwise just might need to document that "mode" is
unreliable when the namespace is disabled.

>
>
> >> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> >> ---
> >>   arch/powerpc/include/asm/libnvdimm.h |  9 ++++++++
> >>   arch/powerpc/mm/Makefile             |  1 +
> >>   arch/powerpc/mm/nvdimm.c             | 34 ++++++++++++++++++++++++++++
> >>   arch/x86/include/asm/libnvdimm.h     | 19 ++++++++++++++++
> >>   drivers/nvdimm/nd.h                  |  6 -----
> >>   drivers/nvdimm/pfn_devs.c            | 32 +++++++++++++++++++++++++-
> >>   include/linux/huge_mm.h              |  7 +++++-
> >>   7 files changed, 100 insertions(+), 8 deletions(-)
> >>   create mode 100644 arch/powerpc/include/asm/libnvdimm.h
> >>   create mode 100644 arch/powerpc/mm/nvdimm.c
> >>   create mode 100644 arch/x86/include/asm/libnvdimm.h
> >>
> >> diff --git a/arch/powerpc/include/asm/libnvdimm.h 
> >> b/arch/powerpc/include/asm/libnvdimm.h
> >> new file mode 100644
> >> index 000000000000..d35fd7f48603
> >> --- /dev/null
> >> +++ b/arch/powerpc/include/asm/libnvdimm.h
> >> @@ -0,0 +1,9 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_POWERPC_LIBNVDIMM_H
> >> +#define _ASM_POWERPC_LIBNVDIMM_H
> >> +
> >> +#define nd_pfn_supported_alignments nd_pfn_supported_alignments
> >> +extern unsigned long *nd_pfn_supported_alignments(void);
> >> +extern unsigned long nd_pfn_default_alignment(void);
> >> +
> >> +#endif
> >> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> >> index 0f499db315d6..42e4a399ba5d 100644
> >> --- a/arch/powerpc/mm/Makefile
> >> +++ b/arch/powerpc/mm/Makefile
> >> @@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM)         += highmem.o
> >>   obj-$(CONFIG_PPC_COPRO_BASE)   += copro_fault.o
> >>   obj-$(CONFIG_PPC_PTDUMP)       += ptdump/
> >>   obj-$(CONFIG_KASAN)            += kasan/
> >> +obj-$(CONFIG_NVDIMM_PFN)               += nvdimm.o
> >> diff --git a/arch/powerpc/mm/nvdimm.c b/arch/powerpc/mm/nvdimm.c
> >> new file mode 100644
> >> index 000000000000..a29a4510715e
> >> --- /dev/null
> >> +++ b/arch/powerpc/mm/nvdimm.c
> >> @@ -0,0 +1,34 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +
> >> +#include <asm/pgtable.h>
> >> +#include <asm/page.h>
> >> +
> >> +#include <linux/mm.h>
> >> +/*
> >> + * We support only pte and pmd mappings for now.
> >> + */
> >> +const unsigned long *nd_pfn_supported_alignments(void)
> >> +{
> >> +       static unsigned long supported_alignments[3];
> >> +
> >> +       supported_alignments[0] = PAGE_SIZE;
> >> +
> >> +       if (has_transparent_hugepage())
> >> +               supported_alignments[1] = HPAGE_PMD_SIZE;
> >> +       else
> >> +               supported_alignments[1] = 0;
> >> +
> >> +       supported_alignments[2] = 0;
> >> +       return supported_alignments;
> >> +}
> >
> > I'd prefer to not create a powerpc specific object file especially
> > because nothing in the above looks powerpc specific. Anything that
> > prevents just making this the common implementation?
> >
>
>
> Support for PUD size alignment. The alignment details of what can be
> supported is largely a arch specific details. We could do that with
>
> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>                 HPAGE_PUD_SIZE,
> #endif
>
> But I was looking at arch specific functions rather than having all
> those #ifdef there.

The ifdefs could still move to a header with a:

    static const unsigned long supported_alignments[3];

...definition.

>
> >> +
> >> +/*
> >> + * Use pmd mapping if supported as default alignment
> >> + */
> >> +unsigned long nd_pfn_default_alignment(void)
> >> +{
> >> +
> >> +       if (has_transparent_hugepage())
> >> +               return HPAGE_PMD_SIZE;
> >> +       return PAGE_SIZE;
> >> +}
> >> diff --git a/arch/x86/include/asm/libnvdimm.h 
> >> b/arch/x86/include/asm/libnvdimm.h
> >> new file mode 100644
> >> index 000000000000..3d5361db9164
> >> --- /dev/null
> >> +++ b/arch/x86/include/asm/libnvdimm.h
> >> @@ -0,0 +1,19 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_X86_LIBNVDIMM_H
> >> +#define _ASM_X86_LIBNVDIMM_H
> >> +
> >> +static inline unsigned long nd_pfn_default_alignment(void)
> >> +{
> >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >> +       return HPAGE_PMD_SIZE;
> >> +#else
> >> +       return PAGE_SIZE;
> >> +#endif
> >> +}
> >> +
> >> +static inline unsigned long nd_altmap_align_size(unsigned long nd_align)
> >> +{
> >> +       return PMD_SIZE;
> >> +}
> >> +
> >> +#endif
> >> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> >> index e89af4b2d8e9..401a78b02116 100644
> >> --- a/drivers/nvdimm/nd.h
> >> +++ b/drivers/nvdimm/nd.h
> >> @@ -289,12 +289,6 @@ static inline struct device *nd_btt_create(struct 
> >> nd_region *nd_region)
> >>   struct nd_pfn *to_nd_pfn(struct device *dev);
> >>   #if IS_ENABLED(CONFIG_NVDIMM_PFN)
> >>
> >> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >> -#define PFN_DEFAULT_ALIGNMENT HPAGE_PMD_SIZE
> >> -#else
> >> -#define PFN_DEFAULT_ALIGNMENT PAGE_SIZE
> >> -#endif
> >> -
> >>   int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns);
> >>   bool is_nd_pfn(struct device *dev);
> >>   struct device *nd_pfn_create(struct nd_region *nd_region);
> >> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> >> index 22db77ac3d0e..7dc0b5da4c6b 100644
> >> --- a/drivers/nvdimm/pfn_devs.c
> >> +++ b/drivers/nvdimm/pfn_devs.c
> >> @@ -10,6 +10,7 @@
> >>   #include <linux/slab.h>
> >>   #include <linux/fs.h>
> >>   #include <linux/mm.h>
> >> +#include <asm/libnvdimm.h>
> >>   #include "nd-core.h"
> >>   #include "pfn.h"
> >>   #include "nd.h"
> >> @@ -103,6 +104,8 @@ static ssize_t align_show(struct device *dev,
> >>          return sprintf(buf, "%ld\n", nd_pfn->align);
> >>   }
> >>
> >> +#ifndef nd_pfn_supported_alignments
> >> +#define nd_pfn_supported_alignments nd_pfn_supported_alignments
> >>   static const unsigned long *nd_pfn_supported_alignments(void)
> >>   {
> >>          /*
> >> @@ -125,6 +128,7 @@ static const unsigned long 
> >> *nd_pfn_supported_alignments(void)
> >>
> >>          return data;
> >>   }
> >> +#endif
> >>
> >>   static ssize_t align_store(struct device *dev,
> >>                  struct device_attribute *attr, const char *buf, size_t 
> >> len)
> >> @@ -302,7 +306,7 @@ struct device *nd_pfn_devinit(struct nd_pfn *nd_pfn,
> >>                  return NULL;
> >>
> >>          nd_pfn->mode = PFN_MODE_NONE;
> >> -       nd_pfn->align = PFN_DEFAULT_ALIGNMENT;
> >> +       nd_pfn->align = nd_pfn_default_alignment();
> >>          dev = &nd_pfn->dev;
> >>          device_initialize(&nd_pfn->dev);
> >>          if (ndns && !__nd_attach_ndns(&nd_pfn->dev, ndns, &nd_pfn->ndns)) 
> >> {
> >> @@ -412,6 +416,20 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn 
> >> *nd_pfn)
> >>          return 0;
> >>   }
> >>
> >> +static bool nd_supported_alignment(unsigned long align)
> >> +{
> >> +       int i;
> >> +       const unsigned long *supported = nd_pfn_supported_alignments();
> >> +
> >> +       if (align == 0)
> >> +               return false;
> >> +
> >> +       for (i = 0; supported[i]; i++)
> >> +               if (align == supported[i])
> >> +                       return true;
> >> +       return false;
> >> +}
> >> +
> >>   /**
> >>    * nd_pfn_validate - read and validate info-block
> >>    * @nd_pfn: fsdax namespace runtime state / properties
> >> @@ -496,6 +514,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
> >> *sig)
> >>                  return -EOPNOTSUPP;
> >>          }
> >>
> >> +       /*
> >> +        * Check whether the we support the alignment. For Dax if the
> >> +        * superblock alignment is not matching, we won't initialize
> >> +        * the device.
> >> +        */
> >> +       if (!nd_supported_alignment(align) &&
> >> +           !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
> >
> > ^^^ more indentation reflow.
> >
>
> Both .clang-format and earlier emacs customization recommendation used
> the above style. I sent a  v7 with this fixup alone.

Thanks for that I appreciate the fix up on something that's a
reasonable choice either way.

>
>
> >> +               dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
> >> +                       "%ld:%ld\n", nd_pfn->align, align);
> >> +               return -EOPNOTSUPP;
> >> +       }
> >> +
> >>          if (!nd_pfn->uuid) {
> >>                  /*
> >>                   * When probing a namepace via nd_pfn_probe() the uuid
> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> index 45ede62aa85b..4fa91f9bd0da 100644
> >> --- a/include/linux/huge_mm.h
> >> +++ b/include/linux/huge_mm.h
> >> @@ -108,7 +108,12 @@ static inline bool 
> >> __transparent_hugepage_enabled(struct vm_area_struct *vma)
> >>
> >>          if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> >>                  return true;
> >> -
> >> +       /*
> >> +        * For dax let's try to do hugepage fault always. If we don't 
> >> support
> >> +        * hugepages we will not have enabled namespaces with hugepage 
> >> alignment.
> >> +        * This also means we try to handle hugepage fault on device with
> >> +        * smaller alignment. But for then we will return with 
> >> VM_FAULT_FALLBACK
> >> +        */
> >
> > Please fix this comment up to not use the word "we". This should also
> > get an ack from linux-mm folks. "We" in the above alternately means
> > "we: the runtime/compile-tims platform setting", or "we: the kernel
> > implementation".
> >
>
> I can update the comment to indicate runtime part.

Thanks!
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to