On Wed, Oct 16, 2019 at 7:43 PM Aneesh Kumar K.V
<[email protected]> wrote:
>
> On 10/17/19 12:35 AM, Dan Williams wrote:
> > On Wed, Oct 16, 2019 at 9:59 AM Aneesh Kumar K.V
> > <[email protected]> wrote:
> >>
> >> On 10/16/19 9:34 PM, Dan Williams wrote:
> >>> On Tue, Oct 15, 2019 at 10:29 PM Aneesh Kumar K.V
> >>> <[email protected]> wrote:
> >>>>
> >>>> On 10/16/19 3:32 AM, Dan Williams wrote:
> >>>>> On Tue, Oct 15, 2019 at 8:33 AM Aneesh Kumar K.V
> >>>>> <[email protected]> wrote:
> >>>>>>
> >>>>>> nvdimm core currently maps the full namespace to an ioremap range
> >>>>>> while probing the namespace mode. This can result in probe failures
> >>>>>> on architectures that have limited ioremap space.
> >>>>>
> >>>>> Is there a #define for that limit?
> >>>>>
> >>>>
> >>>> Arch specific #define. For example. ppc64 have different limits based on
> >>>> platform and translation mode. Hash translation with 4k PAGE_SIZE limit
> >>>> ioremap range to 8TB.
> >>>>
> >>>>>> nvdimm core can avoid this failure by only mapping the reserver block 
> >>>>>> area to
> >>>>>> check for pfn superblock type and map the full namespace resource only 
> >>>>>> before
> >>>>>> using the namespace. nvdimm core use ioremap range only for the raw 
> >>>>>> and btt
> >>>>>> namespace and we can limit the max namespace size for these two modes. 
> >>>>>> For
> >>>>>> both fsdax and devdax this change enables nvdimm to map namespace 
> >>>>>> larger
> >>>>>> that ioremap limit.
> >>>>>
> >>>>> If the direct map has more space I think it would be better to add a
> >>>>> way to use that to map for all namespaces rather than introduce
> >>>>> arbitrary failures based on the mode.
> >>>>>
> >>>>> I would buy a performance argument to avoid overmapping, but for
> >>>>> namespace access compatibility where an alternate mapping method would
> >>>>> succeed I think we should aim for that to be used instead. Thoughts?
> >>>>>
> >>>>
> >>>> That would require to have struct page allocated for these range and
> >>>> both raw and btt don't need a struct page backing?
> >>>>
> >>>
> >>> I was thinking a new mapping interface that just consumed direct-map
> >>> space, but did not allocate pages.
> >>>
> >>
> >> Not sure how easy that would be. We are looking at having part of
> >> direct-map address not managed by any zone and then possibly archs need
> >> to be taught to handle these ? (for example for ppc64 we "bolt" direct
> >> map range where as we allow taking low level hash fault for I/O remap 
> >> range)
> >>
> >> Even though you don't consider the patch as complete, considering the
> >> approach you outlined would require larger changes, do you think this
> >> patch can be accepted as a bug fix? Right now we can fail namespace
> >> initialization during boot or ndctl enable-namespace all.
> >>
> >> For example with ppc64 and I/O remap range limit of 8TB, we can
> >> individually create a 6TB namespace. We also allow to create multiple
> >> such namespaces. But if we try to enable them all together using ndctl
> >> enable-namespace all, that will fail with error
> >>
> >> [   54.259910] vmap allocation for size x failed: use vmalloc=<size> to
> >> increase size
> >>
> >> because we probe these namespaces in parallel.
> >
> > The patch is incomplete, right?
>
> Incomplete with respect to the detail that we still don't allow large
> raw and btt namespaces.
>
>
> > It does not fix the raw mode namespace
> > case, and that error message seems to indicate to the user how to fix
> > the problem. I was of the impression it was a fixed range in the
> > address map. Could you instead try to autodetect the potential pmem
> > usage and auto increase the vmap space?
> >
> The error is printed by generic code and the failures are due to fixed
> size. We can't workaround that using vmalloc=<size> option.

Darn.

Ok, explain to me again how this patch helps. This just seems to delay
the inevitable failure a bit, but the end result is that the user
needs to pick and choose which namespaces to enable after the kernel
has tried to auto-probe namespaces.
_______________________________________________
Linux-nvdimm mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to