On Tue, Nov 05, 2019 at 03:22:08PM +0000, Lauri Tirkkonen wrote:
> Hi,
> 
> make has the concept of 'expensive' jobs: if the command line includes
> the word "make" in it, the command is considered expensive and other
> jobs are held until that job finishes. It does this to avoid exponential
> behavior when parallelizing with -j.
> 
> Some other makes (eg. bmake and gmake) use a jobserver, ie. a pair of
> pipes between child makes and the top-level make: the top-level make
> provides job tokens to children via these pipes, which are returned when
> the jobs complete. This ensures that recursive makes only run a certain
> number of jobs in total at any one time, while achieving greater
> resource utilisation (no need to wait for an 'expensive' job to finish
> before starting others).
> 
> This diff implements a similar jobserver in make. Like bmake and gmake,
> job tokens are provided by the top-level make (called "jobserver
> master"), used by children ("jobserver slaves") when they run commands,
> and returned when those commands finish. Like bmake, the internal,
> undocumented command line option to tell (possibly indirect, via
> MAKEFLAGS)) children about the file descriptor to use is -J, but unlike
> bmake and gmake, we use socketpair(AF_UNIX, SOCK_STREAM) instead of
> pipes, and therefore only need to pass one file descriptor to children
> instead of two.

Sorry, I missed your initial message (thanks to naddy@ for telling me about it)

A jobserver was considered in make before implementing expensive.

The main reason it was not done that way is that you can't guarantee a
file descriptor will make it through the intermediate shell.  

In fact, POSIX doesn't guarantee anything, and a lot of shells actively close
file descriptors they don't know about.

Your example is not complicated enough, as your commands mean NO 
shell will be spawned (everything will pass directly through to 
submakes, which is BTW a large reason for implementing -C, which cuts 
down on the number of intermediate processes).

I'm not sure I like added complexity in that part of make.

> I haven't yet built the entire tree with this, just cautiously probing
> the waters first to see if there is interest :)

I'd like some actual numbers on real world applications: what does it change
for the actual build through the full source tree.   Areas like perl are where
the job explosion occurs and where the expensive heuristics got refined.


> Tested with the following Makefile structure:
> 
>       $ cat recursive/Makefile
>       .PHONY: a b c d
> 
>       all: a b c d
> 
>       a:
>               make -C a
>       b:
>               make -C a
>       c:
>               make -C a
>       d:
>               make -C a
>       $ cat recursive/a/Makefile
>       all: 1 2 3 4
> 
>       1:
>               sleep 1
>       2:
>               sleep 2
>       3:
>               sleep 3
>       4:
>               sleep 4
> 
> the speedup is as follows with -j4:
> 
> before:
>       make -C recursive -j4  0.01s user 0.10s system 0% cpu 16.097 total
> after:
>       make -C recursive -j4  0.02s user 0.15s system 1% cpu 11.082 total

Comments on the patch proper:

> ---
>  usr.bin/make/job.c  | 317 +++++++++++++++++++++++++++++++++++---------
>  usr.bin/make/job.h  |   7 +-
>  usr.bin/make/main.c |  31 ++++-
>  usr.bin/make/main.h |   3 +
>  4 files changed, 290 insertions(+), 68 deletions(-)
> 
> diff --git a/usr.bin/make/job.c b/usr.bin/make/job.c
> index ebbae5306da..b36aaeacac8 100644
> --- a/usr.bin/make/job.c
> +++ b/usr.bin/make/job.c
> @@ -90,11 +90,15 @@
>   *   Job_Wait                Wait for all running jobs to finish.
>   */
>  
> +#include <sys/socket.h>
>  #include <sys/types.h>
>  #include <sys/wait.h>
> +#include <assert.h>
>  #include <ctype.h>
>  #include <errno.h>
> +#include <err.h>
>  #include <fcntl.h>
> +#include <poll.h>
>  #include <signal.h>
>  #include <stdarg.h>
>  #include <stdio.h>
> @@ -115,6 +119,7 @@
>  #include "memory.h"
>  #include "make.h"
>  #include "buf.h"
> +#include "main.h"
>  
>  static int   aborting = 0;       /* why is the make aborting? */
>  #define ABORT_ERROR  1           /* Because of an error */
> @@ -123,12 +128,15 @@ static int      aborting = 0;       /* why is the make 
> aborting? */
>  
>  static int   maxJobs;        /* The most children we can run at once */
>  static int   nJobs;          /* Number of jobs already allocated */
> -static bool  no_new_jobs;    /* Mark recursive shit so we shouldn't start
> -                              * something else at the same time
> -                              */
I don't think ditching expensive entirely is a good idea anyhow.
What happens if the jobserver fails ?  I see it goes to err all the time.
BTW, going to err is bad. Make handles processes. Most errors MUST do 
something sane, like waiting for processes to end in many many cases.

> +
> +struct jobserver {
> +     int     master;         /* master deposits job tokens into this fd */
> +     int     slave;          /* slaves consume job tokens from this fd */
> +     volatile sig_atomic_t tokens;   /* currently available tokens */
> +};
you didn't really read the code, did you ? why volatile sig_atomic_t ?
> +static struct jobserver jobserver;
>  Job *runningJobs;            /* Jobs currently running a process */
>  Job *errorJobs;                      /* Jobs in error at end */
> -static Job *heldJobs;                /* Jobs not running yet because of 
> expensive */
>  static pid_t mypid;          /* Used for printing debugging messages */
>  
>  static volatile sig_atomic_t got_fatal;
> @@ -144,11 +152,16 @@ static void postprocess_job(Job *, bool);
>  static Job *prepare_job(GNode *);
>  static void determine_job_next_step(Job *);
>  static void remove_job(Job *, bool);
> -static void may_continue_job(Job *);
>  static void continue_job(Job *);
>  static Job *reap_finished_job(pid_t);
>  static bool reap_jobs(void);
>  
> +static void jobserver_init(unsigned, int);
> +static bool jobserver_is_slave(void);
> +static void jobserver_master_reclaim_tokens(void);
> +static void jobserver_master_send_tokens(void);
> +static void jobserver_acquire_token(Job *job);
> +static void jobserver_release_token(Job *job);
>  static void loop_handle_running_jobs(void);
>  static bool expensive_job(Job *);
>  static bool expensive_command(const char *);
> @@ -347,10 +360,9 @@ print_errors(void)
>  static void
>  setup_signal(int sig)
>  {
> -     if (signal(sig, SIG_IGN) != SIG_IGN) {
> -             (void)signal(sig, notice_signal);
> -             sigaddset(&sigset, sig);
> -     }
> +     sigaction(sig, &(struct sigaction) { .sa_handler = notice_signal },
> +         NULL);
> +     sigaddset(&sigset, sig);
>  }
Why did you change this ? this is gratuitous.
Besides, I'm not 100% sure you can do that, I think it will break m88k.

>  static void
> @@ -551,20 +563,15 @@ postprocess_job(Job *job, bool okay)
>               Finish();
>  }
>  
> -/* expensive jobs handling: in order to avoid forking an exponential number
> - * of jobs, make tries to figure out "recursive make" configurations.
> - * It may err on the side of caution.
> +/* expensive jobs handling: make tries to figure out "recursive make"
> + * configurations. It may err on the side of caution.
>   * Basically, a command is "expensive" if it's likely to fork an extra
>   * level of make: either by looking at the command proper, or if it has
>   * some specific qualities ('+cmd' are likely to be recursive, as are
>   * .MAKE: commands).  It's possible to explicitly say some targets are
>   * expensive or cheap with .EXPENSIVE or .CHEAP.
??? if you kill half that thing, kill it all, or not. Looks like half-baked
code.


> - * While an expensive command is running, no_new_jobs
> - * is set, so jobs that would fork new processes are accumulated in the
> - * heldJobs list instead.
> - *
> - * This heuristics is also used on error exit: we display silent commands
> + * This heuristics is used on error exit: we display silent commands
>   * that failed, unless those ARE expensive commands: expensive commands
>   * are likely to not be failing by themselves, but to be the result of
>   * a cascade of failures in descendant makes.
> @@ -574,7 +581,6 @@ determine_expensive_job(Job *job)
>  { 
>       if (expensive_job(job)) {
>               job->flags |= JOB_IS_EXPENSIVE;
> -             no_new_jobs = true;
>       } else
>               job->flags &= ~JOB_IS_EXPENSIVE;
>       if (DEBUG(EXPENSIVE))
> @@ -672,23 +678,14 @@ prepare_job(GNode *gn)
>       }
>  }
>  
> -static void
> -may_continue_job(Job *job)
> -{
> -     if (no_new_jobs) {
> -             if (DEBUG(EXPENSIVE))
> -                     fprintf(stderr, "[%ld] expensive -> hold %s\n",
> -                         (long)mypid, job->node->name);
> -             job->next = heldJobs;
> -             heldJobs = job;
> -     } else
> -             continue_job(job);
> -}
> -
>  static void
>  continue_job(Job *job)
>  {
> -     bool finished = job_run_next(job);
> +     bool finished;
> +
> +     jobserver_acquire_token(job);
> +
> +     finished = job_run_next(job);
>       if (finished)
>               remove_job(job, true);
>       else
> @@ -715,27 +712,19 @@ Job_Make(GNode *gn)
>       if (!job)
>               return;
>       nJobs++;
> -     may_continue_job(job);
> +     continue_job(job);
>  }
>  
>  static void
>  determine_job_next_step(Job *job)
>  {
>       bool okay;
> -     if (job->flags & JOB_IS_EXPENSIVE) {
> -             no_new_jobs = false;
> -             if (DEBUG(EXPENSIVE))
> -                     fprintf(stderr, "[%ld] "
> -                         "Returning from expensive target %s, "
> -                         "allowing new jobs\n", (long)mypid, 
> -                         job->node->name);
> -     }
>  
>       okay = job->exit_type == JOB_EXIT_OKAY;
>       if (!okay || job->next_cmd == NULL)
>               remove_job(job, okay);
>       else
> -             may_continue_job(job);
> +             continue_job(job);
>  }
>  
>  static void
> @@ -743,17 +732,6 @@ remove_job(Job *job, bool okay)
>  {
>       nJobs--;
>       postprocess_job(job, okay);
> -     while (!no_new_jobs) {
> -             if (heldJobs != NULL) {
> -                     job = heldJobs;
> -                     heldJobs = heldJobs->next;
> -                     if (DEBUG(EXPENSIVE))
> -                             fprintf(stderr, "[%ld] cheap -> release %s\n",
> -                                 (long)mypid, job->node->name);
> -                     continue_job(job);
> -             } else
> -                     break;
> -     }
>  }
>  
>  /*
> @@ -793,6 +771,7 @@ reap_jobs(void)
>       while ((pid = waitpid(WAIT_ANY, &status, WNOHANG)) > 0) {
>               reaped = true;
>               job = reap_finished_job(pid);
> +             jobserver_release_token(job);
>  
>               if (job == NULL) {
>                       Punt("Child (%ld) not in table?", (long)pid);
> @@ -810,6 +789,16 @@ reap_jobs(void)
>  void
>  handle_running_jobs(void)
>  {
> +     /*
> +      * If we are the jobserver master and all jobs have already been
> +      * started, close the inheritable slave socket if open; this way the
> +      * master socket will become disconnected once all children have died.
> +      */
> +     if (no_jobs_left() && jobserver.master != -1 && jobserver.slave != -1) {
> +             close(jobserver.slave);
> +             jobserver.slave = -1;
> +     }
> +
>       sigset_t old;
>       /* reaping children in the presence of caught signals */
>  
> @@ -830,10 +819,33 @@ handle_running_jobs(void)
>               handle_all_signals();
>               if (reap_jobs())
>                       break;
> -             /* okay, so it's safe to suspend, we have nothing to do but
> -              * wait...
> -              */
> -             sigsuspend(&emptyset);
> +             if (jobserver_is_slave()) {
> +                     /* okay, so it's safe to suspend, we have nothing to do
> +                      * but wait...
> +                      */
> +                     sigsuspend(&emptyset);
> +             }
> +             else {
> +                     struct pollfd pfd = {
> +                             .fd = jobserver.master,
> +                             .events = POLLIN,
> +                     };
> +                     int r;
> +
> +                     if (jobserver.tokens > 0)
> +                             pfd.events |= POLLOUT;
> +
> +                     /* Wait for a signal or for a slave to communicate over
> +                      * the jobserver socket */
> +                     r = ppoll(&pfd, 1, NULL, &emptyset);
> +                     if (r < 0 && errno != EINTR)
> +                             err(1, "jobserver master ppoll");
> +
> +                     if (pfd.revents & POLLIN)
> +                             jobserver_master_reclaim_tokens();
> +                     if (pfd.revents & POLLOUT)
> +                             jobserver_master_send_tokens();
> +             }
>       }
>       sigprocmask(SIG_SETMASK, &old, NULL);
>  }
> @@ -859,20 +871,200 @@ handle_one_job(Job *job)
>  }
>  
>  static void
> -loop_handle_running_jobs()
> +loop_handle_running_jobs(void)
>  {
>       while (runningJobs != NULL)
>               handle_running_jobs();
>  }
Yet another gratuitous change.  There is already a perfectly fine prototype
that asserts (void).
> +int
> +jobserver_get_slave_fd(void)
> +{
> +     return jobserver.slave;
> +}
> +
> +static void
> +jobserver_init(unsigned maxtokens, int fd)
> +{
> +     int sock[2];
> +
> +     if (fd != -1) {
> +             /* received fd from master via -J */
> +             jobserver.master = -1;
> +             jobserver.slave = fd;
> +             return;
> +     }
> +
> +     /* we are the master; create sockets */
> +     if (socketpair(AF_UNIX, SOCK_STREAM, 0, sock) < 0)
> +             err(1, "could not create jobserver sockets");
don't ever use < 0 on syscalls.
> +     jobserver.master = sock[0];
> +     jobserver.slave = sock[1];
> +     jobserver.tokens = maxtokens;
> +
> +     if (fcntl(jobserver.master, F_SETFD, O_CLOEXEC) < 0)
> +             err(1, "could not set jobserver master fd flags");
> +}
> +
> +static bool
> +jobserver_is_slave(void)
> +{
> +     return jobserver.master == -1 && jobserver.slave != -1;
> +}

I'll stop commenting at this point, because this code is nowhere near 
fine for now.

I'm not saying having a jobserver might not be a good idea, AS AN OPTION, 
but this would need to look like openbsd code, compile on every 
openbsd architecture, *and have proper error paths when it fails*.

You have zero guarantee this works on all shells.

The expensive heuristics might be the only reliable way we have of
limiting the proliferation of jobs, so error recovery must be way more
graceful.


It also won't play well with other makes (contrary to expensive).

You didn't say whether there is a convention that (say) gmake or other
bsd makes use for a job server... might be useful to communicate correctly
with these (right now, what happens with MAKEOPTIONS and a child gmake,
for instance).

Call me paranoid, but I'm not too fond of having an extra fd that passes
through each and every subprocess.

The first step would definitely be to check how your patch works in
real conditions (at least a full make release, and maybe a significant
subset of ports).

Assuming this goes well, for starters, I'd like the change to be way 
more clearly separated, to not included gratuitous changes to existing code, 
to look like OpenBSD code, and to have decent error paths.

Reply via email to