Hi John-Paul,

On 22/02/16 17:48, John-Paul Robinson wrote:
Thanks for the update on the status and thoughts around HPCBIOS.  I'm
interested in exploring these options.

We are in the process of migrating from ROCKS5 (CentOS5) to a platform
based on CentOS7. As part of this, we're adopting EasyBuild to manage
builds.  A large part of our community is bioinformatics oriented.  They
use a mix of popular tools on the cli HPC side which have been
self-managed.   On the web interface side, we have a  Galaxy instance
with it's own tool collection.  A curated reference model like HPCBIOS
could be useful reference for our community.

We will attempt the --try-toolchain-version as a first pass.  I did
attempt this bump approach to move Cufflinks to an updated goolf but ran
into a number of cross dependencies.  Cufflinks has some complex
dependencies and I was new to EasyBuild so that may have had something
to do with it. I eventually had to correct a number of easyconfigs to
move to goolf-1.7.20
https://github.com/hpcugent/easybuild-easyconfigs/pull/2058

After some reflection, I've come up with an Easy* solution.  I created a
texinfo-4.13a easyconfig which will allow GCC 4.7.2 to build when
loaded.  This bootstraps the goolf-1.4.10 chain and should enable using
most of the HPCBIOS easyconfigs directly.  (eb --from-pr 2543)

This makes sense, and you can even fix it once and for all by issuing a PR to include textinfo 4.13a as a build dep in the GCC 4.7.2 easyconfig file:

    name = 'GCC'
    version = '4.7.2'

    ...

    # make sure dependencies are loaded when building/installing GCC,
    # by using an empty version (rather than using 'dummy' as version)
    toolchain = {'name': 'dummy', 'version': ''}

    ...

# less strict texinfo version required, which allowed document parsing error
    # (this is fixed in more recent versions of GCC)
    builddependencies = [('texinfo', '4.13a')]


The easiest way is to provide an easyconfig file for texinfo 4.13a that is built with the dummy toolchain (i.e. texinfo-4.13a.eb), to avoid that one version of GCC is required to build another version of GCC. That does leave us at the mercy of the OS-provided compiler to get texinfo 4.13a built, but that may be acceptable here...



This seems like a reasonable short term solution for seeding our
environment and may provide reproducibility over the long term.  It will
also be worth the effort to update to goolf-1.7.20 and newer chains.
I'm not deeply familiar with version motivations and build options for
bioinformatics pipelines but do want to provide our users with the most
efficient execution on newer hardware.  Having an updated HPCBIOS and
easyconfigs seems like a good way to accomplish that and prepare for the
improvements promised by the phi later this year
(http://software.intel.com/XeonPhiCatalog).

Promises, promises. ;-)


regards,

Kenneth

Thanks again and look forward to continuing this conversation,

~jpr

On 02/19/2016 07:37 PM, Fotis Georgatos wrote:
Hi jpr!

On Feb 18, 2016, at 7:43 PM, Kenneth Hoste <[email protected]> wrote:
One thing you should be aware of is --try-toolchain, and the fact that it works 
nicely together with --robot.

See for example the output of "eb pBWA-0.5.9_1.21009-goolf-1.4.10.eb -D 
--try-toolchain-version=1.7.20";
new easyconfigs for both BWA and pBWA will be generated for you, in which the 
toolchain version is replaced.

That should make it significantly easier to 'bump' the toolchain.
This.

In principle, it should be possible to use the above technique combined
with the existing HPCBIOS bundles and actually arrive quite far.
I’ve done it several times, the LifeSciences cases tend to be very robust.

It is true that they all have started showing their age, due to toolchains 
though:
- first generation goolf/1.4.10 includes the gcc version with the issues you 
mentioned;
   we could agree on a fi. 1.4.11 strain just to get by - nothing complicated 
with that.
- the ictce/5.3.0 is even more problematic: Intel has ceased releasing those 
particular sources,
   on the premise of some compiler instability (among all icc/ifort variants, 
that particular one);
   But: you can still do --try-toolchain=ictce,5.5.0 and get very useful things 
produced so, good as-is.


Since you are looking into it, we now have the chance to improve on a number of 
fronts such as:
[0] Refresh both goolf & intel variants with their more modern equivalent 
toolchains
[1] Reconcile LifeSciences, Bioinfo & Math common dependencies by ironing out 
annoying Boost collisions
[2] Weed out/fix problematic dependencies from Bioinfo (Intel ipp in goolf 
variant, I am looking at you)
[3] Push out a few more HPCBIOS targets which are collecting dust in some (of 
my) digital corners :)
[4] Test against multiple distros to stabilise the deps into convenient choices 
(more art than science here)
[5] New configs should rather use new yaml like easyconfig format (more 
suitable for this activity).

I especially like feature [1] because it permits to mix’n’match bundles at will,
but it takes some good effort until things start to melt together nicely.

In an ideal world, we would have an intern or two with some good 
CI/docker/distros skills,
isolate her or them from daily systems firefighting, and the above would occur 
quite fast.

Alas...

If anybody is available to put man-hours into this, please step forward, to 
follow up with a hangout.

Fotis

Reply via email to