Hi -

File this under "better late than never."  

I merged this patch and added a few more patches to make the changes I
mentioned in the code review and to fix a couple other bugs.  

For instance, the topology stuff wasn't identifying NUMA nodes properly
and was turning off - at least for all of my machines.  I fixed that up
and a few other topology-related things.  

The most user-visible change is that 'prov' grants cores based on a list
(like perf) instead of single items, and it can spawn processes.  That
required moving some perf code to parlib.

Here's an example of prov with the new 'packed' scheduler:

bash-4.3$ prov -tc -v39-46 pthread_test 1000 1000000 10 &

bash-4.3$ prov -s
...
Core 37, prov: 0(0x0000000000000000) alloc: 0(0x0000000000000000)
Core 38, prov: 0(0x0000000000000000) alloc: 42(0xffff8000150d2100)
Core 39, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 40, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 41, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 42, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 43, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 44, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 45, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 46, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100)
Core 47, prov: 0(0x0000000000000000) alloc: 42(0xffff8000150d2100)
Core 48, prov: 0(0x0000000000000000) alloc: 0(0x0000000000000000)
...

Note the scheduler gave out the provisioned cores first (as is usual).
When it looked for more cores, it gave out cores that are hyperthreaded
to existing cores  (38 is a sibling of 39, and 47 is a sibling of 48).
If it gave out any more cores, it would come from the same socket as
the existing cores.

I haven't tested it much, so we'll see.  Nothing requires a toolchain
rebuild, but some of the parlib headers changes actually affect it.  So
rebuild if you're paranoid.


The following changes since commit ef39a1bbb55afb3e89f311e474cd8cb906cb49ca:

  vmm: Provide a fast-path for IPIs in the kernel (2017-12-12 14:51:57 -0500)

are available in the Git repository at:

  [email protected]:brho/akaros.git topo

for you to fetch changes up to 3d407708b8bc4bb5f202aad7582f1a40034e37a3:

  sched: Slightly fix up tests/prov (2017-12-13 17:34:46 -0500)

----------------------------------------------------------------
View this online at:
https://github.com/brho/akaros/compare/ef39a1bbb55a...3d407708b8bc

----------------------------------------------------------------
Barret Rhoden (12):
      vmx: Squelch per-core startup messages
      x86: Fix topology detection
      sched: Remove the idle core interface
      sched: Fix packed initialization
      sched: When disabling SMT, turn off the odd cores
      sched: Touch up the packed scheduler
      parlib: Make bitmask.h more compilable
      parlib: Move perf's xlib to parlib
      perf: Rename the ros_ core_set code
      parlib: Move core_sets to parlib
      parlib: Move the provisioning of cores to a PID
      sched: Slightly fix up tests/prov

Kevin Klues (1):
      Add implementation of packed core alloc strategy

 Kconfig                                            |   8 +
 kern/arch/x86/topology.c                           |  11 +-
 kern/arch/x86/vmm/intel/vmx.c                      |   1 -
 kern/include/corealloc_packed.h                    |  46 ++
 kern/include/corerequest.h                         |  11 +-
 kern/include/schedule.h                            |  10 -
 kern/src/Kbuild                                    |   1 +
 kern/src/corealloc_fcfs.c                          |  82 +---
 kern/src/corealloc_packed.c                        | 468 +++++++++++++++++++++
 kern/src/schedule.c                                |  27 +-
 tests/prov.c                                       | 122 ++----
 tools/dev-util/perf/Makefile                       |   2 +-
 tools/dev-util/perf/akaros.h                       |  50 ---
 tools/dev-util/perf/perf.c                         |  27 +-
 tools/dev-util/perf/perf_core.c                    |   2 +-
 tools/dev-util/perf/perf_core.h                    |   2 +-
 tools/dev-util/perf/xlib.c                         | 149 -------
 tools/dev-util/perf/xlib.h                         |  14 +-
 .../perf/akaros.c => user/parlib/core_set.c        |  47 ++-
 user/parlib/include/parlib/bitmask.h               |  21 +-
 user/parlib/include/parlib/core_set.h              |  43 ++
 user/parlib/include/parlib/parlib.h                |   2 +
 user/parlib/include/parlib/x86/bitmask.h           |  20 +-
 user/parlib/include/parlib/xlib.h                  |  26 ++
 user/parlib/parlib.c                               |  19 +
 user/parlib/xlib.c                                 | 163 +++++++
 26 files changed, 894 insertions(+), 480 deletions(-)
 create mode 100644 kern/include/corealloc_packed.h
 create mode 100644 kern/src/corealloc_packed.c
 delete mode 100644 tools/dev-util/perf/akaros.h
 rename tools/dev-util/perf/akaros.c => user/parlib/core_set.c (68%)
 create mode 100644 user/parlib/include/parlib/core_set.h
 create mode 100644 user/parlib/include/parlib/xlib.h
 create mode 100644 user/parlib/xlib.c

Thanks,
Barret


On 2015-12-03 at 15:34 Barret Rhoden <[email protected]> wrote:
> Hi -
> 
> Couple minor things, listed below.
> 
> 
> On 2015-11-25 at 06:42 Kevin Klues <[email protected]> wrote:
> > This change introduces the packed core allocation strategy that Valmon
> > and I worked on over the summer. The idea is to keep cores allocated
> > to the same process as tightly packed as possible, while keeping cores
> > allocated to different processes as far away from each other as
> > possible.
> > 
> > The existing FCFS core allocation strategy is still the default, but
> > the new packed strategy can be selected at compile time via a config
> > variable (CONFIG_COREALLOC_PACKED).
> > 
> > The plan is to keep the existing strategy as the default for now, but
> > migrate to the new strategy once we're able to concretely measure the
> > benefits of it.
> > 
> > The changes in this request can be viewed online at:
> > 
> >     https://github.com/brho/akaros/compare/abe793d...0780d03
> > 
> > The following changes since commit
> > abe793d7c7507cbd6732ed2d794916ae295523ad:
> > 
> >   Added test for devarch MSR file (2015-11-24 15:39:05 -0500)
> > 
> > are available in the git repository at:
> > 
> >   [email protected]:klueska/akaros corealloc-packed
> 
> > From eb630798aa2ac398870687ab291d1fbe04034457 Mon Sep 17 00:00:00 2001
> > From: Kevin Klues <[email protected]>
> > Date: Mon, 5 Oct 2015 19:42:22 -0700
> > Subject: Add implementation of packed core alloc strategy
> 
> > --- /dev/null
> > +++ b/kern/src/corealloc_packed.c
> > @@ -0,0 +1,508 @@
> 
> > +enum pnode_type { CORE, CPU, SOCKET, NUMA, MACHINE, NUM_NODE_TYPES };
> > +static char pnode_label[5][8] = { "CORE", "CPU", "SOCKET", "NUMA", 
> > "MACHINE" };
> 
> The 5 and 8 seem a little janky.  I guess 5 is NUM_NODE_TYPES and 8 is the max
> of the strlen + 1?
> 
> > +/* Create a node and initialize it. This code assumes that child are 
> > created
> > + * before parent nodes. */
> > +static void init_nodes(int type, int num, int nchildren)
> > +{
> > +   /* Initialize the lookup table for this node type. */
> > +   num_nodes[type] = num;
> > +   node_lookup[type] = all_nodes;
> > +   for (int i = CORE; i < type; i++)
> > +           node_lookup[type] += num_nodes[i];
> 
> This is a little mystical at first.  From this, it looks like all_nodes is 
> laid
> out such that all of the nodes are: CORES, then the CPUS, then the SOCKETS, 
> then
> NUMA, then one for MACHINE.  The nodes at any given layer are later broken up
> into chunks of n and assigned to their parents (down below).  Can yinz put
> something in here that explains this layout?
> 
> > +/* Allocate a table of distances from one core to an other. Cores on the 
> > same
> > + * cpu have a distance of 1; same socket have a distance of 2; same numa 
> > -> 3;
> > + * same machine -> 4; */
> > +static void init_core_distances(void)
> > +{
> > +   core_distance = kzmalloc(num_cores * sizeof(int*), 0);
> > +   if (core_distance == NULL)
> > +           panic("Out of memory!\n");
> 
> Can use KMALLOC_WAIT in lieu of panicking.
> 
> > +   for (int i = 0; i < num_cores; i++) {
> > +           core_distance[i] = kzmalloc(num_cores * sizeof(int), 0);
> > +           if (core_distance[i] == NULL)
> > +                   panic("Out of memory!\n");
> 
> Can use KMALLOC_WAIT in lieu of panicking.
> 
> > +   }
> > +   for (int i = 0; i < num_cores; i++) {
> > +           for (int j = 0; j < num_cores; j++) {
> > +                   for (int k = CPU; k <= MACHINE; k++) {
> > +                           if (i/num_descendants[k][CORE] ==
> > +                                   j/num_descendants[k][CORE]) {
> > +                                   core_distance[i][j] = k;
> 
> This took me a little bit to sort out.
> 
> Two cores are have the same level id (and thus at distance k) if their IDs
> divided by the number of cores per level are the same, due to how we number 
> our
> cores.  i.e with some suitably documented helper:
> 
>       int get_level_id(int coreid, int level)
>       {
>               return coreid / num_descendants[level][CORE];
>       }
> 
>       if (get_level_id(i, k) == get_level_id(j, k))
>               core_distance[i][j] = k;
> 
> Can yinz either use a helper or add a line or two to explain that?
> 
> 
> > +/* Initialize any data assocaited with doing core allocation. */
> > +void corealloc_init(void)
> > +{
> > +   void *nodes_and_cores;
> > +
> > +   /* Allocate a flat array of nodes. */
> > +   total_nodes = num_cores + num_cpus + num_sockets + num_numa + 1;
> > +   nodes_and_cores = kmalloc(total_nodes * sizeof(struct sched_pnode) +
> > +                             num_cores * sizeof(struct sched_pcore), 0);
> 
> That kmalloc should take either KMALLOC_WAIT or check the return value.
> 
> 
> > +/* Returns the sum of the distances from one core to all cores in a list. 
> > */
> > +static int cumulative_core_distance(struct sched_pcore *c,
> > +                                    struct sched_pcore_tailq cl)
> > +{
> > +   int d = 0;
> > +   struct sched_pcore *temp = NULL;
> > +
> > +   TAILQ_FOREACH(temp, &cl, alloc_next) {
> 
> This assumes that the cl list is a list of allocated cores, not any generic
> list.  Yinz only use it on allocated lists, so that's fine.  We probably need 
> a
> comment about that though.
> 
> 
> > +/* Track the pcore properly when it is allocated to p. This code assumes 
> > that
> > + * the scheduler that uses it holds a lock for the duration of the call. */
> > +void __track_core_alloc(struct proc *p, uint32_t pcoreid)
> > +{
> > +   struct sched_pcore *spc;
> > +
> > +   assert(pcoreid < num_cores);    /* catch bugs */
> > +   spc = pcoreid2spc(pcoreid);
> > +   assert(spc->alloc_proc != p);   /* corruption or double-alloc */
> > +   spc->alloc_proc = p;
> > +   /* if the pcore is prov to them and now allocated, move lists */
> > +   if (spc->prov_proc == p) {
> > +           TAILQ_REMOVE(&p->ksched_data.crd.prov_not_alloc_me, spc, 
> > prov_next);
> > +           TAILQ_INSERT_TAIL(&p->ksched_data.crd.prov_alloc_me, spc, 
> > prov_next);
> > +   }
> > +   /* Actually allocate the core, removing it from the idle core list. */
> > +   TAILQ_INSERT_TAIL(&p->ksched_data.crd.alloc_me, spc, alloc_next);
> > +   TAILQ_REMOVE(&idlecores, spc, alloc_next);
> 
> The order of these two might be backwards.  You probably want to remove it
> before adding it to another list, esp since alloc_next is used for both lists.
> 
> 
> > +/* Get an idle core from our pcore list and return its core_id. Don't
> > + * consider the chosen core in the future when handing out cores to a
> > + * process. This code assumes that the scheduler that uses it holds a lock
> > + * for the duration of the call. This will not give out provisioned cores. 
> > */
> > +int __get_any_idle_core(void)
> > +{
> > +   struct sched_pcore *spc;
> > +   int ret = -1;
> > +
> > +   for (int i = 0; i < num_cores; i++) {
> > +           struct sched_pcore *c = &all_pcores[i];
> > +
> > +           if (spc->alloc_proc == NULL) {
> > +                   spc->alloc_proc = UNNAMED_PROC;
> > +                   ret = spc->core_info->core_id;
> 
> Does this code also need to do something with the refcnt, like yinz did when
> initializing the scheduler and removing cores?
> 
> The steps for removing a core from consideration for allocation could use a
> helper, which we can then use here.  And we'd need one for returning it (for
> put_idle_core).
> 
> Also, do these __get_ routines need to remove the core from the idle core 
> list?
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to