Nice! Valmonā€˜s legacy lives on.

Barret Rhoden <[email protected]> schrieb am Mi. 13. Dez. 2017 um 14:38:

> Hi -
>
> File this under "better late than never."
>
> I merged this patch and added a few more patches to make the changes I
> mentioned in the code review and to fix a couple other bugs.
>
> For instance, the topology stuff wasn't identifying NUMA nodes properly
> and was turning off - at least for all of my machines.  I fixed that up
> and a few other topology-related things.
>
> The most user-visible change is that 'prov' grants cores based on a list
> (like perf) instead of single items, and it can spawn processes.  That
> required moving some perf code to parlib.
>
> Here's an example of prov with the new 'packed' scheduler:
>
> bash-4.3$ prov -tc -v39-46 pthread_test 1000 1000000 10 &
>
> bash-4.3$ prov -s
> ...
> Core 37, prov: 0(0x0000000000000000) alloc: 0(0x0000000000000000)
> Core 38, prov: 0(0x0000000000000000) alloc: 42(0xffff8000150d2100)
> Core 39, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
> Core 40, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
> Core 41, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
> Core 42, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
> Core 43, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
> Core 44, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
> Core 45, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
> Core 46, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
> Core 47, prov: 0(0x0000000000000000) alloc: 42(0xffff8000150d2100)
> Core 48, prov: 0(0x0000000000000000) alloc: 0(0x0000000000000000)
> ...
>
> Note the scheduler gave out the provisioned cores first (as is usual).
> When it looked for more cores, it gave out cores that are hyperthreaded
> to existing cores  (38 is a sibling of 39, and 47 is a sibling of 48).
> If it gave out any more cores, it would come from the same socket as
> the existing cores.
>
> I haven't tested it much, so we'll see.  Nothing requires a toolchain
> rebuild, but some of the parlib headers changes actually affect it.  So
> rebuild if you're paranoid.
>
>
> The following changes since commit
> ef39a1bbb55afb3e89f311e474cd8cb906cb49ca:
>
>   vmm: Provide a fast-path for IPIs in the kernel (2017-12-12 14:51:57
> -0500)
>
> are available in the Git repository at:
>
>   [email protected]:brho/akaros.git topo
>
> for you to fetch changes up to 3d407708b8bc4bb5f202aad7582f1a40034e37a3:
>
>   sched: Slightly fix up tests/prov (2017-12-13 17:34:46 -0500)
>
> ----------------------------------------------------------------
> View this online at:
> https://github.com/brho/akaros/compare/ef39a1bbb55a...3d407708b8bc
>
> ----------------------------------------------------------------
> Barret Rhoden (12):
>       vmx: Squelch per-core startup messages
>       x86: Fix topology detection
>       sched: Remove the idle core interface
>       sched: Fix packed initialization
>       sched: When disabling SMT, turn off the odd cores
>       sched: Touch up the packed scheduler
>       parlib: Make bitmask.h more compilable
>       parlib: Move perf's xlib to parlib
>       perf: Rename the ros_ core_set code
>       parlib: Move core_sets to parlib
>       parlib: Move the provisioning of cores to a PID
>       sched: Slightly fix up tests/prov
>
> Kevin Klues (1):
>       Add implementation of packed core alloc strategy
>
>  Kconfig                                            |   8 +
>  kern/arch/x86/topology.c                           |  11 +-
>  kern/arch/x86/vmm/intel/vmx.c                      |   1 -
>  kern/include/corealloc_packed.h                    |  46 ++
>  kern/include/corerequest.h                         |  11 +-
>  kern/include/schedule.h                            |  10 -
>  kern/src/Kbuild                                    |   1 +
>  kern/src/corealloc_fcfs.c                          |  82 +---
>  kern/src/corealloc_packed.c                        | 468
> +++++++++++++++++++++
>  kern/src/schedule.c                                |  27 +-
>  tests/prov.c                                       | 122 ++----
>  tools/dev-util/perf/Makefile                       |   2 +-
>  tools/dev-util/perf/akaros.h                       |  50 ---
>  tools/dev-util/perf/perf.c                         |  27 +-
>  tools/dev-util/perf/perf_core.c                    |   2 +-
>  tools/dev-util/perf/perf_core.h                    |   2 +-
>  tools/dev-util/perf/xlib.c                         | 149 -------
>  tools/dev-util/perf/xlib.h                         |  14 +-
>  .../perf/akaros.c => user/parlib/core_set.c        |  47 ++-
>  user/parlib/include/parlib/bitmask.h               |  21 +-
>  user/parlib/include/parlib/core_set.h              |  43 ++
>  user/parlib/include/parlib/parlib.h                |   2 +
>  user/parlib/include/parlib/x86/bitmask.h           |  20 +-
>  user/parlib/include/parlib/xlib.h                  |  26 ++
>  user/parlib/parlib.c                               |  19 +
>  user/parlib/xlib.c                                 | 163 +++++++
>  26 files changed, 894 insertions(+), 480 deletions(-)
>  create mode 100644 kern/include/corealloc_packed.h
>  create mode 100644 kern/src/corealloc_packed.c
>  delete mode 100644 tools/dev-util/perf/akaros.h
>  rename tools/dev-util/perf/akaros.c => user/parlib/core_set.c (68%)
>  create mode 100644 user/parlib/include/parlib/core_set.h
>  create mode 100644 user/parlib/include/parlib/xlib.h
>  create mode 100644 user/parlib/xlib.c
>
> Thanks,
> Barret
>
>
> On 2015-12-03 at 15:34 Barret Rhoden <[email protected]> wrote:
> > Hi -
> >
> > Couple minor things, listed below.
> >
> >
> > On 2015-11-25 at 06:42 Kevin Klues <[email protected]> wrote:
> > > This change introduces the packed core allocation strategy that Valmon
> > > and I worked on over the summer. The idea is to keep cores allocated
> > > to the same process as tightly packed as possible, while keeping cores
> > > allocated to different processes as far away from each other as
> > > possible.
> > >
> > > The existing FCFS core allocation strategy is still the default, but
> > > the new packed strategy can be selected at compile time via a config
> > > variable (CONFIG_COREALLOC_PACKED).
> > >
> > > The plan is to keep the existing strategy as the default for now, but
> > > migrate to the new strategy once we're able to concretely measure the
> > > benefits of it.
> > >
> > > The changes in this request can be viewed online at:
> > >
> > >     https://github.com/brho/akaros/compare/abe793d...0780d03
> > >
> > > The following changes since commit
> > > abe793d7c7507cbd6732ed2d794916ae295523ad:
> > >
> > >   Added test for devarch MSR file (2015-11-24 15:39:05 -0500)
> > >
> > > are available in the git repository at:
> > >
> > >   [email protected]:klueska/akaros corealloc-packed
> >
> > > From eb630798aa2ac398870687ab291d1fbe04034457 Mon Sep 17 00:00:00 2001
> > > From: Kevin Klues <[email protected]>
> > > Date: Mon, 5 Oct 2015 19:42:22 -0700
> > > Subject: Add implementation of packed core alloc strategy
> >
> > > --- /dev/null
> > > +++ b/kern/src/corealloc_packed.c
> > > @@ -0,0 +1,508 @@
> >
> > > +enum pnode_type { CORE, CPU, SOCKET, NUMA, MACHINE, NUM_NODE_TYPES };
> > > +static char pnode_label[5][8] = { "CORE", "CPU", "SOCKET", "NUMA",
> "MACHINE" };
> >
> > The 5 and 8 seem a little janky.  I guess 5 is NUM_NODE_TYPES and 8 is
> the max
> > of the strlen + 1?
> >
> > > +/* Create a node and initialize it. This code assumes that child are
> created
> > > + * before parent nodes. */
> > > +static void init_nodes(int type, int num, int nchildren)
> > > +{
> > > +   /* Initialize the lookup table for this node type. */
> > > +   num_nodes[type] = num;
> > > +   node_lookup[type] = all_nodes;
> > > +   for (int i = CORE; i < type; i++)
> > > +           node_lookup[type] += num_nodes[i];
> >
> > This is a little mystical at first.  From this, it looks like all_nodes
> is laid
> > out such that all of the nodes are: CORES, then the CPUS, then the
> SOCKETS, then
> > NUMA, then one for MACHINE.  The nodes at any given layer are later
> broken up
> > into chunks of n and assigned to their parents (down below).  Can yinz
> put
> > something in here that explains this layout?
> >
> > > +/* Allocate a table of distances from one core to an other. Cores on
> the same
> > > + * cpu have a distance of 1; same socket have a distance of 2; same
> numa -> 3;
> > > + * same machine -> 4; */
> > > +static void init_core_distances(void)
> > > +{
> > > +   core_distance = kzmalloc(num_cores * sizeof(int*), 0);
> > > +   if (core_distance == NULL)
> > > +           panic("Out of memory!\n");
> >
> > Can use KMALLOC_WAIT in lieu of panicking.
> >
> > > +   for (int i = 0; i < num_cores; i++) {
> > > +           core_distance[i] = kzmalloc(num_cores * sizeof(int), 0);
> > > +           if (core_distance[i] == NULL)
> > > +                   panic("Out of memory!\n");
> >
> > Can use KMALLOC_WAIT in lieu of panicking.
> >
> > > +   }
> > > +   for (int i = 0; i < num_cores; i++) {
> > > +           for (int j = 0; j < num_cores; j++) {
> > > +                   for (int k = CPU; k <= MACHINE; k++) {
> > > +                           if (i/num_descendants[k][CORE] ==
> > > +                                   j/num_descendants[k][CORE]) {
> > > +                                   core_distance[i][j] = k;
> >
> > This took me a little bit to sort out.
> >
> > Two cores are have the same level id (and thus at distance k) if their
> IDs
> > divided by the number of cores per level are the same, due to how we
> number our
> > cores.  i.e with some suitably documented helper:
> >
> >       int get_level_id(int coreid, int level)
> >       {
> >               return coreid / num_descendants[level][CORE];
> >       }
> >
> >       if (get_level_id(i, k) == get_level_id(j, k))
> >               core_distance[i][j] = k;
> >
> > Can yinz either use a helper or add a line or two to explain that?
> >
> >
> > > +/* Initialize any data assocaited with doing core allocation. */
> > > +void corealloc_init(void)
> > > +{
> > > +   void *nodes_and_cores;
> > > +
> > > +   /* Allocate a flat array of nodes. */
> > > +   total_nodes = num_cores + num_cpus + num_sockets + num_numa + 1;
> > > +   nodes_and_cores = kmalloc(total_nodes * sizeof(struct sched_pnode)
> +
> > > +                             num_cores * sizeof(struct sched_pcore),
> 0);
> >
> > That kmalloc should take either KMALLOC_WAIT or check the return value.
> >
> >
> > > +/* Returns the sum of the distances from one core to all cores in a
> list. */
> > > +static int cumulative_core_distance(struct sched_pcore *c,
> > > +                                    struct sched_pcore_tailq cl)
> > > +{
> > > +   int d = 0;
> > > +   struct sched_pcore *temp = NULL;
> > > +
> > > +   TAILQ_FOREACH(temp, &cl, alloc_next) {
> >
> > This assumes that the cl list is a list of allocated cores, not any
> generic
> > list.  Yinz only use it on allocated lists, so that's fine.  We probably
> need a
> > comment about that though.
> >
> >
> > > +/* Track the pcore properly when it is allocated to p. This code
> assumes that
> > > + * the scheduler that uses it holds a lock for the duration of the
> call. */
> > > +void __track_core_alloc(struct proc *p, uint32_t pcoreid)
> > > +{
> > > +   struct sched_pcore *spc;
> > > +
> > > +   assert(pcoreid < num_cores);    /* catch bugs */
> > > +   spc = pcoreid2spc(pcoreid);
> > > +   assert(spc->alloc_proc != p);   /* corruption or double-alloc */
> > > +   spc->alloc_proc = p;
> > > +   /* if the pcore is prov to them and now allocated, move lists */
> > > +   if (spc->prov_proc == p) {
> > > +           TAILQ_REMOVE(&p->ksched_data.crd.prov_not_alloc_me, spc,
> prov_next);
> > > +           TAILQ_INSERT_TAIL(&p->ksched_data.crd.prov_alloc_me, spc,
> prov_next);
> > > +   }
> > > +   /* Actually allocate the core, removing it from the idle core
> list. */
> > > +   TAILQ_INSERT_TAIL(&p->ksched_data.crd.alloc_me, spc, alloc_next);
> > > +   TAILQ_REMOVE(&idlecores, spc, alloc_next);
> >
> > The order of these two might be backwards.  You probably want to remove
> it
> > before adding it to another list, esp since alloc_next is used for both
> lists.
> >
> >
> > > +/* Get an idle core from our pcore list and return its core_id. Don't
> > > + * consider the chosen core in the future when handing out cores to a
> > > + * process. This code assumes that the scheduler that uses it holds a
> lock
> > > + * for the duration of the call. This will not give out provisioned
> cores. */
> > > +int __get_any_idle_core(void)
> > > +{
> > > +   struct sched_pcore *spc;
> > > +   int ret = -1;
> > > +
> > > +   for (int i = 0; i < num_cores; i++) {
> > > +           struct sched_pcore *c = &all_pcores[i];
> > > +
> > > +           if (spc->alloc_proc == NULL) {
> > > +                   spc->alloc_proc = UNNAMED_PROC;
> > > +                   ret = spc->core_info->core_id;
> >
> > Does this code also need to do something with the refcnt, like yinz did
> when
> > initializing the scheduler and removing cores?
> >
> > The steps for removing a core from consideration for allocation could
> use a
> > helper, which we can then use here.  And we'd need one for returning it
> (for
> > put_idle_core).
> >
> > Also, do these __get_ routines need to remove the core from the idle
> core list?
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Akaros" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
-- 
~Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to