On 9/12/2012 5:58 AM, Ian Stakenvicius wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 12/09/12 05:55 AM, Gregory M. Turner wrote:

Note that, effectively, we have this already, and it's called
"portage". But one could certainly make a case for modularizing it
better, since, in truth, we are talking about a very common, very
abstract problem here which portage shares with any number of
batch-build systems.

Such an engine could very well do exactly the right thing if it
were faced with a constraint that a certain part of a certain build
needed to proceed without parallelism due to limitations coming
from the build.

Also, there are very large parts of most builds -- configure comes
to mind -- that don't parallelize even if, perhaps, they should.
In such cases, a really smart global parallelism arbiter could
easily respond by spawning more jobs from other builds.


So essentially what you're saying here is that it might be worthwhile
to look into parallelism as a whole and possibly come up with a
solution that combines 'emerge --jobs' and build-system parallelism
together to maximum benefit?

Yeah, couldn't have said it better myself ... apparently :)

Advanced HPC systems (sys-cluster/torque along with an appropriate
scheduler, for instance) can do such things with their jobs when the
jobs are properly built; I could see portage being able to handle this
as well given most of what is necessary is already known (ebuild
phases, build system type (via eclass), etc).   However, given the
limitations already put on parallelism in terms of emerge order, etc,
I could see this solution needing to be -very- complex and integration
needing to occur on multiple levels.  We'd also need to consider
distcc (and other cluster-shared compilation methods if there are
any??)..  It would be an interesting project, though.

ACK all of the above.

Tempting to think more deeply about this but probably the last thing I need to do right now is to talk myself into another speculative project.

I've hurt my wrist a bit -- probably an RSI -- so should help deter me :S

Only a few major sources of parallelism exist in portage: --jobs / --load-average in emerge opts, multiprocessing eclass & equiv. ebuild helper, distcc, and make... Infrastructure is already in place for all of those, so perhaps a good holistic solution exists that isn't /too/ complicated.

...OK another f!#!%$^ brainstorm incoming :)

For "JOBS" syntax... what really seems missing in portage are:

  o a clean way to say "dont parallelize this particular make
    invocation" in ebuilds

  o a clean way to globally say "try to use this parallelization
    strategy when emerging."

So what about something like:

  o EMERGE_JOBS and EMERGE_LOAD_AVERAGE make.conf vars equiv. to
    --jobs and --load-average emerge options

  o EBUILD_JOBS and EBUILD_LOAD_AVERAGE make.conf vars

  o If the latter are not specified, they are copied respectively from
    the former (debatable for *_JOBS, since now we get 16 processes when
    we asked for four).

  o MAKEOPTS is auto-extended to reflect EBUILD_JOBS/EBUILD_LOAD_AVERAGE
    if & only if -j|--jobs|-l|--load-average options aren't provided in
    make.conf/profile/envvar MAKEOPTS

  o however, if MAKEOPTS "override" EBUILD_JOBS or EBUILD_LOAD_AVERAGE,
    issue a conspicuous yellow-stars warning

  o extend "emake" to accept a "--non-parallel" option which will
    strip all -j|--jobs|-l|--load-average options from MAKEOPTS;
    perhaps support an equivalent EBUILD_NON_PARALLEL envvar as well,
    with support for override in profile.bashrc. Don't warn about this
    overriding EBUILD_JOBS -- treat as SOP.

  o debatable: respect EBUILD_NON_PARALLEL in multiprocessing, etc?
    or, perhaps, something like:

    EMAKE_NON_PARALLEL=${EMAKE_NON_PARALLEL:-${EBUILD_NON_PARALLEL:-no}}

    could be used to distinguish between "don't use any parallelism"
    and "don't use GNU's make parallelism in emake".  Also maybe a
    better name exists that doesn't use double-negatives.

?

Seems to me something vaguely like the above would provide

  o backward compatibility for ebuilds and make.conf

  o not so vastly different than what we have

  o a decent way to specify what "we really want" globally;
    insofar as portage doesn't do the best job effecting the requested
    parallelization strategy, more ambitious tactics could be
    implemented later, hopefully without huge interface revisions.

-gmt

P.S.:

(Kind-of-crazy additional idea: put ceil(sqrt(EMERGE_JOBS)) into EBUILD_JOBS when only the former is specified, and then let effective_emerge_jobs equal floor(EMERGE_JOBS/EBUILD_JOBS).... but maybe too much automagic for this to be a good idea.)

Reply via email to