On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote:
> regenerating autotools in packages that have a lot of AC_CONFIG_SUBDIRS is
> really slow due to the serialization of all the dirs (which really isn't
> required).  so i took some code that i merged into portage semi-recently
> (which is based on work by Brian, although i'm not sure he wants to admit it)

I've come up with worse things in the name of speed (see the 
daemonized ebuild processor...) ;)

> and put it into a new multiprocessing.eclass.  this way people can generically
> utilize this in their own eclasses/ebuilds.
> 
> it doesn't currently support nesting.  not sure if i should fix that.
> 
> i'll follow up with an example of parallelizing of eautoreconf.  for
> mail-filter/maildrop on my 4 core system, it cuts the time needed to run from
> ~2.5 min to ~1 min.

My main concern here is cleanup during uncontrolled shutdown; if the 
backgrounded job has hung itself for some reason, the job *will* just 
sit; I'm not aware of any of the PMs doing process tree killing, or 
cgroups containment; in my copious free time I'm planning on adding a 
'cjobs' tool for others, and adding cgroups awareness into pkgcore; 
that said, none of 'em do this *now*, thus my concern.



> -mike
> 
> # Copyright 1999-2012 Gentoo Foundation
> # Distributed under the terms of the GNU General Public License v2
> # $Header: $
> 
> # @ECLASS: multiprocessing.eclass
> # @MAINTAINER:
> # base-sys...@gentoo.org
> # @AUTHORS:
> # Brian Harring <ferri...@gentoo.org>
> # Mike Frysinger <vap...@gentoo.org>
> # @BLURB: parallelization with bash (wtf?)
> # @DESCRIPTION:
> # The multiprocessing eclass contains a suite of functions that allow ebuilds
> # to quickly run things in parallel using shell code.
> 
> if [[ ${___ECLASS_ONCE_MULTIPROCESSING} != "recur -_+^+_- spank" ]] ; then
> ___ECLASS_ONCE_MULTIPROCESSING="recur -_+^+_- spank"
> 
> # @FUNCTION: makeopts_jobs
> # @USAGE: [${MAKEOPTS}]
> # @DESCRIPTION:
> # Searches the arguments (defaults to ${MAKEOPTS}) and extracts the jobs 
> number
> # specified therein.  Useful for running non-make tools in parallel too.
> # i.e. if the user has MAKEOPTS=-j9, this will show "9".
> # We can't return the number as bash normalizes it to [0, 255].  If the flags
> # haven't specified a -j flag, then "1" is shown as that is the default `make`
> # uses.  Since there's no way to represent infinity, we return 999 if the user
> # has -j without a number.
> makeopts_jobs() {
>       [[ $# -eq 0 ]] && set -- ${MAKEOPTS}
>       # This assumes the first .* will be more greedy than the second .*
>       # since POSIX doesn't specify a non-greedy match (i.e. ".*?").
>       local jobs=$(echo " $* " | sed -r -n \
>               -e 
> 's:.*[[:space:]](-j|--jobs[=[:space:]])[[:space:]]*([0-9]+).*:\2:p' \
>               -e 's:.*[[:space:]](-j|--jobs)[[:space:]].*:999:p')
>       echo ${jobs:-1}
> }

This function belongs in eutils, or somewhere similar- pretty sure 
we've got variants of this in multiple spots.  I'd prefer a single 
point to change if/when we add a way to pass parallelism down into the 
env via EAPI.


> # @FUNCTION: multijob_init
> # @USAGE: [${MAKEOPTS}]
> # @DESCRIPTION:
> # Setup the environment for executing things in parallel.
> # You must call this before any other multijob function.
> multijob_init() {
>       # Setup a pipe for children to write their pids to when they finish.
>       mj_control_pipe="${T}/multijob.pipe"
>       mkfifo "${mj_control_pipe}"
>       exec {mj_control_fd}<>${mj_control_pipe}
>       rm -f "${mj_control_pipe}"

Nice; hadn't thought to wipe the pipe on the way out.

> 
>       # See how many children we can fork based on the user's settings.
>       mj_max_jobs=$(makeopts_jobs "$@")
>       mj_num_jobs=0
> }
> 
> # @FUNCTION: multijob_child_init
> # @DESCRIPTION:
> # You must call this first in the forked child process.
> multijob_child_init() {
>       [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
> 
>       trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT
>       trap 'exit 1' INT TERM
> }

Kind of dislike this form since it means consuming code has to be 
aware of, and do the () & trick.

A helper function, something like
multijob_child_job() {
  (
  multijob_child_init
  "$@"
  ) &
  multijob_post_fork || die "game over man, game over"
}

Doing so, would conver your eautoreconf from:
for x in $(autotools_check_macro_val AC_CONFIG_SUBDIRS) ; do
  if [[ -d ${x} ]] ; then
    pushd "${x}" >/dev/null
    (
    multijob_child_init
    AT_NOELIBTOOLIZE="yes" eautoreconf
    ) &
    multijob_post_fork || die
    popd >/dev/null
  fi
done

To:
for x in $(autotools_check_macro_val AC_CONFIG_SUBDIRS) ; do
  if [[ -d ${x} ]]; then
    pushd "${x}" > /dev/null
    AT_NOELIBTOOLIZE="yes" multijob_child_job eautoreconf
    popd
  fi
done


Note, if we used an eval in multijob_child_job, the pushd/popd could 
be folded in.  Debatable.



> # @FUNCTION: multijob_post_fork
> # @DESCRIPTION:
> # You must call this in the parent process after forking a child process.
> # If the parallel limit has been hit, it will wait for one to finish and
> # return the child's exit status.
> multijob_post_fork() {
>       [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
> 
>       : $(( ++mj_num_jobs ))
>       if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then
>               multijob_finish_one
>       fi
>       return $?
> }
> 
> # @FUNCTION: multijob_finish_one
> # @DESCRIPTION:
> # Wait for a single process to exit and return its exit code.
> multijob_finish_one() {
>       [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
> 
>       local pid ret
>       read -r -u ${mj_control_fd} pid ret

Mildly concerned about the failure case here- specifically if the read 
fails (fd was closed, take your pick).


>       : $(( --mj_num_jobs ))
>       return ${ret}
> }
> 
> # @FUNCTION: multijob_finish
> # @DESCRIPTION:
> # Wait for all pending processes to exit and return the bitwise or
> # of all their exit codes.
> multijob_finish() {
>       [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"

Tend to think this should do cleanup, then die if someone invoked the 
api incorrectly; I'd rather see the children reaped before this blows 
up.

>       local ret=0
>       while [[ ${mj_num_jobs} -gt 0 ]] ; do
>               multijob_finish_one
>               : $(( ret |= $? ))
>       done
>       # Let bash clean up its internal child tracking state.
>       wait
>       return ${ret}
> }
> 
> fi


~harring

Reply via email to