Hi - File this under "better late than never."
I merged this patch and added a few more patches to make the changes I mentioned in the code review and to fix a couple other bugs. For instance, the topology stuff wasn't identifying NUMA nodes properly and was turning off - at least for all of my machines. I fixed that up and a few other topology-related things. The most user-visible change is that 'prov' grants cores based on a list (like perf) instead of single items, and it can spawn processes. That required moving some perf code to parlib. Here's an example of prov with the new 'packed' scheduler: bash-4.3$ prov -tc -v39-46 pthread_test 1000 1000000 10 & bash-4.3$ prov -s ... Core 37, prov: 0(0x0000000000000000) alloc: 0(0x0000000000000000) Core 38, prov: 0(0x0000000000000000) alloc: 42(0xffff8000150d2100) Core 39, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100) Core 40, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100) Core 41, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100) Core 42, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100) Core 43, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100) Core 44, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100) Core 45, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100) Core 46, prov: 42(0xffff8000150d2100) alloc: 42(0xffff8000150d2100) Core 47, prov: 0(0x0000000000000000) alloc: 42(0xffff8000150d2100) Core 48, prov: 0(0x0000000000000000) alloc: 0(0x0000000000000000) ... Note the scheduler gave out the provisioned cores first (as is usual). When it looked for more cores, it gave out cores that are hyperthreaded to existing cores (38 is a sibling of 39, and 47 is a sibling of 48). If it gave out any more cores, it would come from the same socket as the existing cores. I haven't tested it much, so we'll see. Nothing requires a toolchain rebuild, but some of the parlib headers changes actually affect it. So rebuild if you're paranoid. The following changes since commit ef39a1bbb55afb3e89f311e474cd8cb906cb49ca: vmm: Provide a fast-path for IPIs in the kernel (2017-12-12 14:51:57 -0500) are available in the Git repository at: [email protected]:brho/akaros.git topo for you to fetch changes up to 3d407708b8bc4bb5f202aad7582f1a40034e37a3: sched: Slightly fix up tests/prov (2017-12-13 17:34:46 -0500) ---------------------------------------------------------------- View this online at: https://github.com/brho/akaros/compare/ef39a1bbb55a...3d407708b8bc ---------------------------------------------------------------- Barret Rhoden (12): vmx: Squelch per-core startup messages x86: Fix topology detection sched: Remove the idle core interface sched: Fix packed initialization sched: When disabling SMT, turn off the odd cores sched: Touch up the packed scheduler parlib: Make bitmask.h more compilable parlib: Move perf's xlib to parlib perf: Rename the ros_ core_set code parlib: Move core_sets to parlib parlib: Move the provisioning of cores to a PID sched: Slightly fix up tests/prov Kevin Klues (1): Add implementation of packed core alloc strategy Kconfig | 8 + kern/arch/x86/topology.c | 11 +- kern/arch/x86/vmm/intel/vmx.c | 1 - kern/include/corealloc_packed.h | 46 ++ kern/include/corerequest.h | 11 +- kern/include/schedule.h | 10 - kern/src/Kbuild | 1 + kern/src/corealloc_fcfs.c | 82 +--- kern/src/corealloc_packed.c | 468 +++++++++++++++++++++ kern/src/schedule.c | 27 +- tests/prov.c | 122 ++---- tools/dev-util/perf/Makefile | 2 +- tools/dev-util/perf/akaros.h | 50 --- tools/dev-util/perf/perf.c | 27 +- tools/dev-util/perf/perf_core.c | 2 +- tools/dev-util/perf/perf_core.h | 2 +- tools/dev-util/perf/xlib.c | 149 ------- tools/dev-util/perf/xlib.h | 14 +- .../perf/akaros.c => user/parlib/core_set.c | 47 ++- user/parlib/include/parlib/bitmask.h | 21 +- user/parlib/include/parlib/core_set.h | 43 ++ user/parlib/include/parlib/parlib.h | 2 + user/parlib/include/parlib/x86/bitmask.h | 20 +- user/parlib/include/parlib/xlib.h | 26 ++ user/parlib/parlib.c | 19 + user/parlib/xlib.c | 163 +++++++ 26 files changed, 894 insertions(+), 480 deletions(-) create mode 100644 kern/include/corealloc_packed.h create mode 100644 kern/src/corealloc_packed.c delete mode 100644 tools/dev-util/perf/akaros.h rename tools/dev-util/perf/akaros.c => user/parlib/core_set.c (68%) create mode 100644 user/parlib/include/parlib/core_set.h create mode 100644 user/parlib/include/parlib/xlib.h create mode 100644 user/parlib/xlib.c Thanks, Barret On 2015-12-03 at 15:34 Barret Rhoden <[email protected]> wrote: > Hi - > > Couple minor things, listed below. > > > On 2015-11-25 at 06:42 Kevin Klues <[email protected]> wrote: > > This change introduces the packed core allocation strategy that Valmon > > and I worked on over the summer. The idea is to keep cores allocated > > to the same process as tightly packed as possible, while keeping cores > > allocated to different processes as far away from each other as > > possible. > > > > The existing FCFS core allocation strategy is still the default, but > > the new packed strategy can be selected at compile time via a config > > variable (CONFIG_COREALLOC_PACKED). > > > > The plan is to keep the existing strategy as the default for now, but > > migrate to the new strategy once we're able to concretely measure the > > benefits of it. > > > > The changes in this request can be viewed online at: > > > > https://github.com/brho/akaros/compare/abe793d...0780d03 > > > > The following changes since commit > > abe793d7c7507cbd6732ed2d794916ae295523ad: > > > > Added test for devarch MSR file (2015-11-24 15:39:05 -0500) > > > > are available in the git repository at: > > > > [email protected]:klueska/akaros corealloc-packed > > > From eb630798aa2ac398870687ab291d1fbe04034457 Mon Sep 17 00:00:00 2001 > > From: Kevin Klues <[email protected]> > > Date: Mon, 5 Oct 2015 19:42:22 -0700 > > Subject: Add implementation of packed core alloc strategy > > > --- /dev/null > > +++ b/kern/src/corealloc_packed.c > > @@ -0,0 +1,508 @@ > > > +enum pnode_type { CORE, CPU, SOCKET, NUMA, MACHINE, NUM_NODE_TYPES }; > > +static char pnode_label[5][8] = { "CORE", "CPU", "SOCKET", "NUMA", > > "MACHINE" }; > > The 5 and 8 seem a little janky. I guess 5 is NUM_NODE_TYPES and 8 is the max > of the strlen + 1? > > > +/* Create a node and initialize it. This code assumes that child are > > created > > + * before parent nodes. */ > > +static void init_nodes(int type, int num, int nchildren) > > +{ > > + /* Initialize the lookup table for this node type. */ > > + num_nodes[type] = num; > > + node_lookup[type] = all_nodes; > > + for (int i = CORE; i < type; i++) > > + node_lookup[type] += num_nodes[i]; > > This is a little mystical at first. From this, it looks like all_nodes is > laid > out such that all of the nodes are: CORES, then the CPUS, then the SOCKETS, > then > NUMA, then one for MACHINE. The nodes at any given layer are later broken up > into chunks of n and assigned to their parents (down below). Can yinz put > something in here that explains this layout? > > > +/* Allocate a table of distances from one core to an other. Cores on the > > same > > + * cpu have a distance of 1; same socket have a distance of 2; same numa > > -> 3; > > + * same machine -> 4; */ > > +static void init_core_distances(void) > > +{ > > + core_distance = kzmalloc(num_cores * sizeof(int*), 0); > > + if (core_distance == NULL) > > + panic("Out of memory!\n"); > > Can use KMALLOC_WAIT in lieu of panicking. > > > + for (int i = 0; i < num_cores; i++) { > > + core_distance[i] = kzmalloc(num_cores * sizeof(int), 0); > > + if (core_distance[i] == NULL) > > + panic("Out of memory!\n"); > > Can use KMALLOC_WAIT in lieu of panicking. > > > + } > > + for (int i = 0; i < num_cores; i++) { > > + for (int j = 0; j < num_cores; j++) { > > + for (int k = CPU; k <= MACHINE; k++) { > > + if (i/num_descendants[k][CORE] == > > + j/num_descendants[k][CORE]) { > > + core_distance[i][j] = k; > > This took me a little bit to sort out. > > Two cores are have the same level id (and thus at distance k) if their IDs > divided by the number of cores per level are the same, due to how we number > our > cores. i.e with some suitably documented helper: > > int get_level_id(int coreid, int level) > { > return coreid / num_descendants[level][CORE]; > } > > if (get_level_id(i, k) == get_level_id(j, k)) > core_distance[i][j] = k; > > Can yinz either use a helper or add a line or two to explain that? > > > > +/* Initialize any data assocaited with doing core allocation. */ > > +void corealloc_init(void) > > +{ > > + void *nodes_and_cores; > > + > > + /* Allocate a flat array of nodes. */ > > + total_nodes = num_cores + num_cpus + num_sockets + num_numa + 1; > > + nodes_and_cores = kmalloc(total_nodes * sizeof(struct sched_pnode) + > > + num_cores * sizeof(struct sched_pcore), 0); > > That kmalloc should take either KMALLOC_WAIT or check the return value. > > > > +/* Returns the sum of the distances from one core to all cores in a list. > > */ > > +static int cumulative_core_distance(struct sched_pcore *c, > > + struct sched_pcore_tailq cl) > > +{ > > + int d = 0; > > + struct sched_pcore *temp = NULL; > > + > > + TAILQ_FOREACH(temp, &cl, alloc_next) { > > This assumes that the cl list is a list of allocated cores, not any generic > list. Yinz only use it on allocated lists, so that's fine. We probably need > a > comment about that though. > > > > +/* Track the pcore properly when it is allocated to p. This code assumes > > that > > + * the scheduler that uses it holds a lock for the duration of the call. */ > > +void __track_core_alloc(struct proc *p, uint32_t pcoreid) > > +{ > > + struct sched_pcore *spc; > > + > > + assert(pcoreid < num_cores); /* catch bugs */ > > + spc = pcoreid2spc(pcoreid); > > + assert(spc->alloc_proc != p); /* corruption or double-alloc */ > > + spc->alloc_proc = p; > > + /* if the pcore is prov to them and now allocated, move lists */ > > + if (spc->prov_proc == p) { > > + TAILQ_REMOVE(&p->ksched_data.crd.prov_not_alloc_me, spc, > > prov_next); > > + TAILQ_INSERT_TAIL(&p->ksched_data.crd.prov_alloc_me, spc, > > prov_next); > > + } > > + /* Actually allocate the core, removing it from the idle core list. */ > > + TAILQ_INSERT_TAIL(&p->ksched_data.crd.alloc_me, spc, alloc_next); > > + TAILQ_REMOVE(&idlecores, spc, alloc_next); > > The order of these two might be backwards. You probably want to remove it > before adding it to another list, esp since alloc_next is used for both lists. > > > > +/* Get an idle core from our pcore list and return its core_id. Don't > > + * consider the chosen core in the future when handing out cores to a > > + * process. This code assumes that the scheduler that uses it holds a lock > > + * for the duration of the call. This will not give out provisioned cores. > > */ > > +int __get_any_idle_core(void) > > +{ > > + struct sched_pcore *spc; > > + int ret = -1; > > + > > + for (int i = 0; i < num_cores; i++) { > > + struct sched_pcore *c = &all_pcores[i]; > > + > > + if (spc->alloc_proc == NULL) { > > + spc->alloc_proc = UNNAMED_PROC; > > + ret = spc->core_info->core_id; > > Does this code also need to do something with the refcnt, like yinz did when > initializing the scheduler and removing cores? > > The steps for removing a core from consideration for allocation could use a > helper, which we can then use here. And we'd need one for returning it (for > put_idle_core). > > Also, do these __get_ routines need to remove the core from the idle core > list? > > -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
