On Sat, Nov 10, 2018 at 03:48:14AM +0000, Elliott, Robert (Persistent Memory) 
wrote:
> > -----Original Message-----
> > From: [email protected] <linux-kernel-
> > [email protected]> On Behalf Of Daniel Jordan
> > Sent: Monday, November 05, 2018 10:56 AM
> > Subject: [RFC PATCH v4 11/13] mm: parallelize deferred struct page
> > initialization within each node
> > 
> > ...  The kernel doesn't
> > know the memory bandwidth of a given system to get the most efficient
> > number of threads, so there's some guesswork involved.  
> 
> The ACPI HMAT (Heterogeneous Memory Attribute Table) is designed to report
> that kind of information, and could facilitate automatic tuning.
> 
> There was discussion last year about kernel support for it:
> https://lore.kernel.org/lkml/[email protected]/

Thanks for bringing this up.  I'm traveling but will take a closer look when I
get back.

> > In testing, a reasonable value turned out to be about a quarter of the
> > CPUs on the node.
> ...
> > +   /*
> > +    * We'd like to know the memory bandwidth of the chip to
> >         calculate the
> > +    * most efficient number of threads to start, but we can't.
> > +    * In testing, a good value for a variety of systems was a
> >         quarter of the CPUs on the node.
> > +    */
> > +   nr_node_cpus = DIV_ROUND_UP(cpumask_weight(cpumask), 4);
> 
> 
> You might want to base that calculation on and limit the threads to
> physical cores, not hyperthreaded cores.

Why?  Hyperthreads can be beneficial when waiting on memory.  That said, I
don't have data that shows that in this case.

Reply via email to