(while waiting for some diskspace being freed ;)
disclaimer: i was never a big fan of the auto-download/source_urls
option in EB for exact this reason, ie sometimes EB will appear to be
failing, due to external reasons (however, the augmented
userfriendliness far outweighs this)
> For instance, if you try to build DOLFIN from zero, you will quickly
realize
that that is doomed to fail, since source for MTL4/4.0.8878 is no longer avail.
There are work-arounds that someone can do (eg. I now symlink the newer version,
faking it to be older) but, these are all inferior to having a copy of the
tarball.
please contribute the newer working .eb files so others don't run into
this same issue
## OSS
Open source packages are also not immune to this, eg. finding the authoritative
source and latest version for mpi-BLAST will keep you wondering for a while.
(if you have that btw, how would you re-obtain it after all? :-P )
Even complete repos with history on github, can vanish from one day to the next.
I certainly don't have a good feeling for the sources of the bioinformatics
codes!
## closed source
Finally, closed source packages are not handled any better. Especially compilers
are a critical dependency as regards reproducibility. I have open two tickets
on Intel, asking for their support on older versions and what is their stance:
https://premier.intel.com/premier/IssueDetail.aspx?IssueID=691555 # icc/11.1.073
https://premier.intel.com/premier/IssueDetail.aspx?IssueID=691558 #
impi/4.0.0.028
(can you read these issues with your own access, btw?)
From the later ticket (IMPI), I received yesterday the following response:
Our official policy is to support two major versions back. At present, that
includes Version 3.2 and Version 4.0, along with their corresponding updates.
This is actually bad news, because if the upstream provider deprives you of the
licensing
(this particular issue was really about that!) it flushes away all the
reproducibility
argument, at least for new-coming HPC sites (we wouldn't have access to old
versions).
Yeah, you can always rebuild from scratch with a newer version, yada, yada...
OK.
# proposal
Now, I understand that not everything will be possible but, I would really
like we had a mirroring solution, at least for the open source software codes.
i'm not sure what you mean with "open source", but afaik it is mainly
the software license that defnies who can distribute what.
Along with it, we would like to work on the SHA1/MD5 hashing business
(ie. ensure that the codes are the ones we claim they are).
a) Have you heard of any other kind of "open source" registry project,
perhaps something that we could ride on registering specific tarballs?
most p2p tools offer something along these lines. (but i'm still waiting
for a fuse interface ;)
b) What technology do you think should be deployed? What are your preferences?
(http, ftp, git, rsync, zsync ... whatever you think should be offered)
most have master/slave models, so that might not be what you are looking
for (granted, you can write scripts to keep things in sync).
a better solution would be distributed filesystem with one (or more)
replica per site.
i'm aware of a project called REDDnet/L-store that aims for something
similar.
i'm not sure if accessing data from a WAN shared filesystem counts as
distributing, so that might be a way out.
c) Is your preference to integrate this to easybuild or, perhaps, keep it
orthogonal?
(eg. someone could bootstrap .local/easybuild/source with zsync & then let
it go)
i would keep it out of EB. integration should be provided, but it is a
data management issue, EB has enough problems to deal with as is. (but
i'll be more then happy to test/contribute ;)
ps.
We are going to implement something anyway on our end (git+http+zsync looks
attractive),
HPCBIOS should cater for this, so the more that get interested in it, the
merrier.
ps2.
Eventually we could come with some kind of solution for the non-OSS codes also,
yet, I can safely predict that vendor licensing may put limits on what is
doable.
(github would not be the spot for that kind of stuff, in any case, btw)
ps3.
Ubuntu's "LTS" lineage is a good example of a commendable vendor retention
policy.
Somehow, not everybody around understands the universal need for LTS style
solutions...
what do you mean? LTS is only 5 years, hardly the lifetime of any of
long running experiment ;)
thanks for looking into this,
Fotis