On Fri, Jan 06, 2017 at 07:51:05PM +0100, Lennart Poettering wrote: > On Fri, 06.01.17 11:59, Marcelo Tosatti (mtosa...@redhat.com) wrote: > > > Cache Allocation Technology is a feature on selected recent Intel Xeon > > processors which allows control over L3 cache allocation. > > What precisely is the benefit of making this configurable? Can you > describe a basic usecase why I'd want to set this on some service?
http://www.intel.com/content/www/us/en/communications/cache-allocation-technology-white-paper.html "On systems with multiple workloads running simultaneously, which heavily utilize the cache; system administrators can utilize CAT to improve performance. CAT can be used to better utilize the caches and as a result some applications are more likely to hit the cache which results in fewer accesses to system memory. This results in lower latency and greater performance." -------- For example, with a job with large memory reaccess distances (where caching hardly brings any benefit), you can limit that job to reclaim a small portion of L3 cache (so you don't throw away data from other jobs from L3 cache). > > > Index: systemd/man/systemd.exec.xml > > =================================================================== > > --- systemd.orig/man/systemd.exec.xml > > +++ systemd/man/systemd.exec.xml > > Hmm, this doesn't look like a git patch? We generally prefer > submissions as github PRs these days, btw. > > > @@ -233,6 +233,31 @@ > > </varlistentry> > > > > <varlistentry> > > + <term><varname>RDTCacheReservation=</varname></term> > > + > > + <listitem><para>Creates a L3 CAT reservation in resctrlfs > > + for the executed processes. Takes a ";" separated list of > > + tuples (optionally triples) containing type,size and > > + optionally cacheid: > > if I read this, then I am puzzled what "RDT Cache" is, what "L3 CAT" > is. Can you explain? RDT: Resource Director Technology(RDT) CAT: Cache Allocation Technology (CAT), a subcomponent of RDT (which includes other things such as CMT: Cache Monitoring Technology). > > > + type=type,size=size,cacheid=id; > > + type=type,size=size,cacheid=id;... > > + > > + Where: > > + > > + type is one of: both, data, code. > > + size is size in kbytes. > > + cacheid (optional) is the cacheid for > > + the reservation. > > what is a "cacheid"? Cache IDs --------- On current generation systems there is one L3 cache per socket and L2 caches are generally just shared by the hyperthreads on a core, but this isn't an architectural requirement. We could have multiple separate L3 caches on a socket, multiple cores could share an L2 cache. So instead of using "socket" or "core" to define the set of logical cpus sharing a resource we use a "Cache ID". At a given cache level this will be a unique number across the whole system (but it isn't guaranteed to be a contiguous sequence, there may be gaps). To find the ID for each logical CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id > > > + > > + Rules for the parameters: > > + * type=code must be paired with type=data entry. > > + > > + See the output of resctrltool for more details. > > + > > + </para></listitem> > > + </varlistentry> > > This syntax doesn't appear to be in line with how we'd do this in > systemd usually. Ideally, we try to hide the fact that our settings > syntax is one-dimensional as much as we can. Hence, a syntax like this > sounds more appropriate to me: > > RDTCacheReservation=4K code > RDTCacheReservation=16K data > > and so on... What you suggest is to essentially to split RDTCacheReservation=type=type,size=size,cacheid=id;type=type,size=size,cacheid=id Into RDTCacheReservation=type=type,size=size,cacheid=id RDTCacheReservation=type=type,size=size,cacheid=id (yes sure can get rid of the commas as well, looks better). > But then again, I have no idea what all of this actually really is, > hence I am not sure my proposed syntax makes much sense. > > sizes are generally parsed with parse_bytes() in systemd, so that K, > M, G suffixes are properly recognized... > > > + ExecContext *c = userdata; > > + int r; > > + int i; > > + > > + assert(bus); > > + assert(reply); > > + assert(c); > > + > > + r = sd_bus_message_open_container(reply, 'a', "s"); > > + if (r < 0) > > + return r; > > + > > + for (i = 0; i < c->argmidx; i++) { > > + r = sd_bus_message_append(reply, "s", c->argm[i]); > > + if (r < 0) > > + return r; > > + } > > indentation is borked. Please indent to multiples of 8ch. > > > +static char* convert_rdt_back(char *argm[], int argmidx) > > +{ > > Please follow coding style, place opening bracket on same name as > funciton name. > > > + int i, ret; > > + char *buf, *str; > > + int size = 0; > > + int printcomma = 0; > > + int printsemicolon = 0; > > + int localmode = 0; > > Please use C99 "bool" when you actually mean a boolean. > > > + > > + for (i=0; i < argmidx; i++) { > > + /* ',', ';', '=' */ > > + size += strlen(argm[i]) + 3; > > + } > > + > > + buf = malloc(size); > > + if (buf == NULL) > > + return NULL; > > + memset(buf, 0, size); > > Use "buf = new0(char, size)" for allocations of this kind. > > > + > > + str = buf; > > + > > + /* cache-id specified */ > > + for (i=0; i < argmidx; i++) { > > + if (strlen(argm[i]) == 10) { > > + if (strcmp(argm[i], "--cache-id") == 0) > > + localmode = 1; > > + } > > + } > > Please use "streq()" instead of strcmp() to compare strings for equality. > > > + > > + for (i=0; i<argmidx; i++) { > > + char *cur = argm[i]; > > + > > + if (strlen(cur) == 0) > > + return buf; > > + > > + if (strlen(cur) == 6) { > > + if (strcmp(cur, "--type") == 0) { > > + ret = sprintf(str, "type="); > > + str = str + ret; > > + printcomma = 1; > > + continue; > > + } > > + > > + if (strcmp(cur, "--size") == 0) { > > + ret = sprintf(str, "size="); > > + str = str + ret; > > + if (localmode == 1) > > + printcomma = 1; > > + else > > + printsemicolon = 1; > > + > > + continue; > > + } > > + } > > + > > + if (strlen(cur) == 10) { > > + if (strcmp(cur, "--cache-id") == 0) { > > + ret = sprintf(str, "cache-id="); > > + str = str + ret; > > + printsemicolon = 1; > > + continue; > > + } > > + } > > + > > + ret = sprintf(str, "%s", cur); > > + str = str + ret; > > + > > + if (i != argmidx - 1) { > > + if (printsemicolon) { > > + ret = sprintf(str, ";"); > > + str = str + 1; > > + } else if (printcomma) { > > + ret = sprintf(str, ","); > > + str = str + 1; > > + } > > + > > + printcomma = 0; > > + printsemicolon = 0; > > + } > > + } > > + > > + return buf; > > +} > > + > > int bus_exec_context_set_transient_property( > > Unit *u, > > ExecContext *c, > > @@ -1377,6 +1492,46 @@ int bus_exec_context_set_transient_prope > > } > > > > return 1; > > + } else if (streq(name, "RDTCacheReservation")) { > > + > > + r = sd_bus_message_enter_container(message, 'a', "s"); > > + if (r < 0) > > + return r; > > + > > + while ((r = sd_bus_message_enter_container(message, 'r', > > "s")) > 0) { > > + const char *str; > > + > > + r = sd_bus_message_read(message, "s", &str); > > + if (r < 0) > > + return r; > > + > > + if (mode != UNIT_CHECK) > > + c->argm[c->argmidx++] = strdup(str); > > Indentation borked. Missing OOM check. > > > + > > + r = sd_bus_message_exit_container(message); > > + if (r < 0) > > + return r; > > + } > > + > > + if (mode != UNIT_CHECK) { > > + char *conv; > > + > > + conv = convert_rdt_back(c->argm, c->argmidx); > > + if (conv == NULL) { > > + return sd_bus_error_setf(error, > > + SD_BUS_ERROR_NO_MEMORY, > > + "out of memory for rdt string > > convertion"); > > you may return -ENOMEM here, the right thing will happen. > > > + } > > + > > + unit_write_drop_in_private_format(u, mode, name, > > "RDTCacheReservation=%s", conv); > > + free(conv); > > + } > > indentation really borked. > > > > +static int fork_exec_resctrltool(Unit *u, char *argv[]) > > +{ > > coding style... opening bracket on same name as function. > > > + int outpipe[2]; > > + int errpipe[2]; > > + pid_t cpid; > > + > > + pipe(outpipe); > > + pipe(errpipe); > > missing error checks. > > > + cpid = fork(); > > Umpff... This is not how we work in systemd. We do not glue things > together form the various system tools we shell out to. Sorry, but > this isn't OK to add this way. Fine, will write a C library. > If access to the reservation file system is straight-forward, please > add some glue code for it to src/basic/ or so, so that we can do the > accesses directly. Or, alternatively, prepare a proper C library that > we can link against, maintained as part of resctrltool. > > Sorry, but this is a total blocker. Either we do the fs access > oursvles, or we call into a properly designed C library we link > against, but shelling out external tools is not an option. > > > + if(cpid == 0) { > > + int r; > > + > > + dup2(errpipe[1], STDERR_FILENO); > > + dup2(outpipe[1], STDOUT_FILENO); > > error checking... > > > + > > + r = execve(argv[0], argv, NULL); > > + if (r == -1) { > > + log_open(); > > + log_unit_error_errno(u, r, "Failed to execve > > resctrltool, ignoring: %m"); > > + log_close(); > > + return -errno; > > + } > > + } else { > > + char buffer[100]; > > arbitrarily sized buffers... are urks. please use LINE_MAX or so at least.... > > > + fd_set fdset; > > + int count; > > + int ret; > > + int nfds; > > + int status = 0; > > + int retst; > > + > > + ret = waitpid(cpid, &status, 0); > > + if (ret == -1) { > > + log_open(); > > + log_unit_error_errno(u, ret, "Failed to waitpid > > resctrltool, ignoring: %m"); > > + log_close(); > > + return -errno; > > + } > > + > > + if (!WIFEXITED(status)) { > > + log_open(); > > + log_unit_error_errno(u, ret, "resctrltool exits > > abnormally, ignoring: %m"); > > + log_close(); > > + return -1; > > + } else { > > + retst = WEXITSTATUS(status); > > + > > + if (retst == 0) { > > + return 0; > > + } > > + } > > + > > + /* > > + * error, read stderr and stdout and log. > > + */ > > + > > + FD_ZERO(&fdset); > > + FD_SET(outpipe[0], &fdset); > > + FD_SET(errpipe[0], &fdset); > > + > > + if (outpipe[0] > errpipe[0]) > > + nfds = outpipe[0] + 1; > > + else > > + nfds = errpipe[0] + 1; > > + > > + ret = select(nfds, &fdset, NULL, NULL, NULL); > > select() is not an option. We regularly have to deal with fds > 1024, > and select() doesn't support those. it's not just slow, it simply > won't work. > > Really, select() is obsolete IRL. > > > + if (ret == -1) { > > + log_open(); > > + log_unit_error_errno(u, ret, "select error, > > ignoring RDTCacheReservation: %m"); > > + log_close(); > > + return -1; > > + } > > + > > + if (FD_ISSET(outpipe[0], &fdset)) { > > + count = read(outpipe[0], buffer, sizeof(buffer)-1); > > + if (count >= 0) { > > + buffer[count] = 0; > > + log_open(); > > + log_unit_error(u, "resctrltool stdout: > > %s", buffer); > > + log_close(); > > + } else { > > + log_open(); > > + log_unit_error(u, "resctrltool io > > error\n"); > > + log_close(); > > + } > > + } > > + > > + if (FD_ISSET(errpipe[0], &fdset)) { > > + count = read(errpipe[0], buffer, sizeof(buffer)-1); > > + if (count >= 0) { > > + buffer[count] = 0; > > + log_open(); > > + log_unit_error(u, "resctrltool stderr: > > %s", buffer); > > + log_close(); > > + } else { > > + log_open(); > > + log_unit_error(u, "resctrltool io > > error\n"); > > + log_close(); > > + } > > + } > > + > > + return retst; > > > This will deadlock if the tool prints more than PIPE_BUF output... > > > + } > > + > > + return -1; > > Please do not make up fake error codes such as "-1". We pretty > universally follow kernel style "return -EFOOBAR". > > > +} > > + > > +static int setup_rdt_cat(Unit *u, const ExecContext *c, pid_t pid) > > +{ > > + int ret, i; > > + const char *arg[5]; > > + char pidstr[100]; > > fixed size arrays... If you only write a pid into this, then use > DECIMAL_STR_MAX(pid_t)... > > > + char *name = u->id; > > + const char *largm[ARGMSIZE + 4]; > > + > > + sprintf(pidstr, "%d", pid); > > Use PID_FMT for this... > > > + > > + arg[0] = "/usr/bin/resctrltool"; > > + arg[1] = "delete"; > > + arg[2] = name; > > + arg[3] = 0; > > + > > + ret = fork_exec_resctrltool(u, (char **)arg); > > + /* we want to ignore 'reservation does not exist' > > + * errors, so skip error checking entirely. > > + * any other errors will be caught when trying > > + * to execute create below > > + */ > > + > > + memset(largm, 0, sizeof(largm)); > > Use memzero()... > > Please have a look at CODING_STYLE. > > And please: the shelling out is not acceptable. Sorry! > > Lennart > > -- > Lennart Poettering, Red Hat _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel