Hi Fotis,
Some remarks below. Given the size of this message, maybe it's a good idea to
follow up the individual topics in separate threads with modified subjects?
On 03 Aug 2013, at 12:46, Fotis Georgatos wrote:
>
> Hello EasyBuilders,
>
> I have been considering of proposing these topics for some weeks now,
> it seems the planets now came to alignment, partly thanks to the wave of v1.6
> :)
> v1.6 is a really good release that meets many expectations.
>
>
> 1) jumbo-toolchain for bio* category now working nicely
>
> First, v1.6 has a couple interesting little easyconfigs, biodeps{,-extended},
> which are supposed to be a common dependency list for life science
> applications,
> that in turn permits to chain together modules in a pipeline of multiple
> components.
>
> Overall it works well and, you can entertain yourself by co-loading 70+
> modules in one go:
> https://github.com/fgeorgatos/easybuild.experimental/blob/master/users/fgeorgatos/HPCBIOS/HPCBIOS_Bioinfo-goolf-1.4.10.eb
> Some packages still need a tweak to be mergeable in (ABySS, BLAT, BLAST,
> NCBI*, Trinity, TopHat)
> IMHO, using it just for building the software in one-go is probably of more
> practical use.
What's missing here exactly? Simply easyconfigs that rely on biodeps?
> 2) standardization, yeah even more
>
> That said, trivialities like colliding Boost<->Boost-Python or
> biodeps<->biodeps-extended, quickly point out that biodeps business
> is really more of a "standardization" work rather than implementation;
> that's why this brainstorm exists in the first place:
> http://hpcbios.readthedocs.org/en/latest/HPCBIOS_2013-01.html
> Comments on how to update/improve it and make it more generic are very
> welcome.
>
> During this week, pfo made a comment about refreshing the SAMtools version;
> I think at some point in time we would all benefit be upgrading these
> versions:
> http://hpcbios.readthedocs.org/en/latest/HPCBIOS_2012-94.html
> and you will find plenty more eventually desired to add in, such as:
> http://www.eaglegenomics.com/2012/04/the-elements-of-bioinformatics/
> As you can see, we already implement a good chunk of the latter via EasyBuild!
> So, I'd suggest *bioinfo* stakeholders get their act together in some way,
> eg. by running an e-sprint by end of year, to update towards some common
> targets (ie. define a standard and the implementation, say, v20131201)
Can you open an issue for an updated/extended biodeps easyconfig, and use that
to drive the discussion?
Do mention the URLs mentioned above there, since they are really good sources
of info w.r.t. this.
> 3) obtaining existing and new application sources
>
> Life has it, that developers don't always have an understanding of what it
> means
> to run HPC operations and play tricks around software distribution channels,
> ranging from non-tagging a git repo (benign) up to flatly swapping tarballs
> (annoying):
> see https://github.com/hpcugent/easybuild-easyconfigs/pull/374/files
>
> Especially modifying the release channel mid-flight, is a big hassle; I did
> spend
> an extra few hours to understand why Rgputools PR arrived broken on github:
> :-(
> https://github.com/hpcugent/easybuild-easyconfigs/pull/282
>
> In short, let's pool our tarballs together in some way,
> define preferred release streams and avoid these issues re-occuring.
> Bioinfo toolchain is especially prone to this, due to the many small tools.
How is the mirror-thing you were working on coming along? That seems to be a
criticial element in all this...
Besides that: there's little we can do against software devs updating released
tarballs without bumping the version.
It's obviously wrong, but maybe then don't see it as a big problem.
Without checking for source 'correctness' in terms of MD5 sums or something
like that, there's no way in telling whether the sources you download are the
same ones that were used by the easyconfig author...
> 4) arch-aware/distro-aware organization
>
> It would be nice to poll opinions on different implementation directions
> and come up with some common codebase, functional for different needs;
> (easybuild has its own logic about deciding OS, fi. to setup CUDA)
>
> fi. compare below $PRACE_ARCH with $BC_CPUTYPE:
> http://www.prace-project.eu/PRACE-Common-Production
> https://github.com/Gregor-Mendel-Institute/env_init/blob/master/components.d/99-environment
> Recent Lmod work on "preloaded", "immutable" modules, may factorize this
> nicely.
> Thanks to pfo/azet for this! question here is: can we improve it to share it
> and how?
> I really applause the fact they made a start and hope more will ride on this
> concept!
What do you want to organize here exactly? I'm missing context here...
> 5) better organize build environment variables
>
> Interactive builds should be a little bit better confined (or documented)
> for the unsuspecting newcomer. This relates both to satisfying PRACE mandates
> (or be able to - see above for the link) and/or, resolving issues like:
> https://github.com/hpcugent/easybuild-easyblocks/issues/100
> The end result is to have some mechanism to prevent the case of loading ictce
> and finding yourself building with gcc, as in:
> http://my.cdash.org/testDetails.php?test=11714674&build=504277
>
> Currently, $MPICC is a surprise exercise to get right (try `$MPICC --version`
> etc) on the shell,
> meanwhile EasyBuild appears to do the right thing when working in batch mode.
> I guess I am missing something, perhaps we just need documentation here.
> (OK, does the answer here relate to the stacked variables discussion via
> Lmod?)
The target to fix here is
https://github.com/hpcugent/easybuild-framework/issues/604 .
It shouldn't be too hard to fix that, someone just needs to pick it up and do
it.
Rerunning the full regtest will then provide more confidence that this does not
unintentionally break stuff.
I agree this is an important issue, but since we (nor our users) are running
into problems related to this, we don't spend time on it for now.
> 6) Implement common HPC policies
>
> This is kind of becoming production during this summer, ie. the idea of
> organizing
> your build-able applications sets into groups, corresponding to so-called
> policies:
> https://github.com/fgeorgatos/easybuild.experimental/tree/master/users/fgeorgatos/HPCBIOS
> If you go through the URL chain, you will eventually realize that these
> implement parts of:
> http://www.ccac.hpc.mil/consolidated/bc/policy.php # see FY06-01, FY06-05,
> later FY06-19/FY06-04
> The idea is, to be able to go to a given site, try `eb HPCBIOS_*` and get the
> world built.
> Anybody here caring to give to the existing ones a spin and provide feedback?
> Would you like to see them included under default EasyBuild repos?
If someone comes up with (versioned) HPCBIOS easyconfigs that group together
the must/should elements of particular HPCBIOS policies, we'll be happy to
include them in the easyconfigs repo and in the next EasyBuild release.
I don't see why not.
My proposal: version these easyconfigs by date (e.g.
HPCBIOS_Bioinfo_2013-08.eb), and maybe provide separate ones for MUST and
MUST+SHOULD...
Having some kind of standard w.r.t. naming of easyconfigs and organization
would really help imho.
> 7) Common list of (bioinformatics?) applications of interest
>
> Last but not least:
> So far, I have been somewhat reluctant to create an online spreadsheet
> with all the bioinfo applications that people (may/might) care about,
> yet it would be such a pity to find (again) ourselves in the
> funny position of 2-3 centers doing the exact same work in parallel;
> yet this happened a few times (eg. MUSCLE, Rosetta, scons, UDUNITS...)
> So, how would you like to handle it to avoid duplication of work?
> How many of you care about this at all?
The ideal situation would be that:
a) someone who starts working on some app opens an issue first expressing
interest, and stating that he has started working on it
b) someome who wants to start working on some app checks whether an issue is
open first
However, this requires discipline, and there's no way to enforce people to do
this (and hence, we don't do it ourselves).
Communication is the only way to avoid duplication of work, but as long as
we're not forced in some way, it's not going to happen.
Maybe we'll just have to live with it, and use it to our advantange, i.e. if it
turns out that two people are working on the same stuff, review each others
work, find a common ground and then either work together on it or let one of
both finish the job.
I'm afraid there's no silver bullet here...
What might help is an eb feature that allows one to crawl all EasyBuild
(experimental) repositories and branches in search for easyconfig files for a
particular software package/version.
But again, that requires discipline on the user: search first, start working
then (after opening an issue so others are aware).
regards,
Kenneth