(while waiting for some diskspace being freed ;)

disclaimer: i was never a big fan of the auto-download/source_urls option in EB for exact this reason, ie sometimes EB will appear to be failing, due to external reasons (however, the augmented userfriendliness far outweighs this)

> For instance, if you try to build DOLFIN from zero, you will quickly realize
that that is doomed to fail, since source for MTL4/4.0.8878 is no longer avail.
There are work-arounds that someone can do (eg. I now symlink the newer version,
faking it to be older) but, these are all inferior to having a copy of the 
tarball.
please contribute the newer working .eb files so others don't run into this same issue


## OSS

Open source packages are also not immune to this, eg. finding the authoritative
source and latest version for mpi-BLAST will keep you wondering for a while.
(if you have that btw, how would you re-obtain it after all? :-P )
Even complete repos with history on github, can vanish from one day to the next.
I certainly don't have a good feeling for the sources of the bioinformatics 
codes!

## closed source

Finally, closed source packages are not handled any better. Especially compilers
are a critical dependency as regards reproducibility. I have open two tickets
on Intel, asking for their support on older versions and what is their stance:
https://premier.intel.com/premier/IssueDetail.aspx?IssueID=691555 # icc/11.1.073
https://premier.intel.com/premier/IssueDetail.aspx?IssueID=691558 # 
impi/4.0.0.028
(can you read these issues with your own access, btw?)

 From the later ticket (IMPI), I received yesterday the following response:

Our official policy is to support two major versions back. At present, that 
includes Version 3.2 and Version 4.0, along with their corresponding updates.

This is actually bad news, because if the upstream provider deprives you of the 
licensing
(this particular issue was really about that!) it flushes away all the 
reproducibility
argument, at least for new-coming HPC sites (we wouldn't have access to old 
versions).
Yeah, you can always rebuild from scratch with a newer version, yada, yada... 
OK.


# proposal

Now, I understand that not everything will be possible but, I would really
like we had a mirroring solution, at least for the open source software codes.
i'm not sure what you mean with "open source", but afaik it is mainly the software license that defnies who can distribute what.

Along with it, we would like to work on the SHA1/MD5 hashing business
(ie. ensure that the codes are the ones we claim they are).

a) Have you heard of any other kind of "open source" registry project,
    perhaps something that we could ride on registering specific tarballs?
most p2p tools offer something along these lines. (but i'm still waiting for a fuse interface ;)


b) What technology do you think should be deployed? What are your preferences?
    (http, ftp, git, rsync, zsync ... whatever you think should be offered)

most have master/slave models, so that might not be what you are looking for (granted, you can write scripts to keep things in sync). a better solution would be distributed filesystem with one (or more) replica per site. i'm aware of a project called REDDnet/L-store that aims for something similar. i'm not sure if accessing data from a WAN shared filesystem counts as distributing, so that might be a way out.

c) Is your preference to integrate this to easybuild or, perhaps, keep it 
orthogonal?
    (eg. someone could bootstrap .local/easybuild/source with zsync & then let 
it go)
i would keep it out of EB. integration should be provided, but it is a data management issue, EB has enough problems to deal with as is. (but i'll be more then happy to test/contribute ;)


ps.
We are going to implement something anyway on our end (git+http+zsync looks 
attractive),
HPCBIOS should cater for this, so the more that get interested in it, the 
merrier.

ps2.
Eventually we could come with some kind of solution for the non-OSS codes also,
yet, I can safely predict that vendor licensing may put limits on what is 
doable.
(github would not be the spot for that kind of stuff, in any case, btw)

ps3.
Ubuntu's "LTS" lineage is a good example of a commendable vendor retention 
policy.
Somehow, not everybody around understands the universal need for LTS style 
solutions...
what do you mean? LTS is only 5 years, hardly the lifetime of any of long running experiment ;)



thanks for looking into this,

Fotis



Reply via email to