first, best wishes for the new year!
my apologies, best whishes to you (and all others ;)

On 05 Jan, 2013, at 05:18, Stijn De Weirdt wrote:
disclaimer: i was never a big fan of the auto-download/source_urls option in EB 
for exact this reason, ie sometimes EB will appear to be failing, due to 
external reasons (however, the augmented userfriendliness far outweighs this)

A common case "for the impatient user" like me is, that you parallelize builds
and you end up overwriting the downloads (eg. GCC-Cloog fighting with plain 
GCC).
I have done it a few times already and it takes a bit to understand what the 
issue is.

Whatever solution we may pick, it will likely address nicely this case, too.
i'm not sure, but fixing fw issue #413 will ;)


For instance, if you try to build DOLFIN from zero, you will quickly realize
that that is doomed to fail, since source for MTL4/4.0.8878 is no longer avail.
There are work-arounds that someone can do (eg. I now symlink the newer version,
faking it to be older) but, these are all inferior to having a copy of the 
tarball.
please contribute the newer working .eb files so others don't run into this 
same issue

Done, along with its deps: easyconfigs/#76
thx!


Now, I understand that not everything will be possible but, I would really
like we had a mirroring solution, at least for the open source software codes.
i'm not sure what you mean with "open source", but afaik it is mainly the 
software license that defnies who can distribute what.

Yeah, better clarify: normally OSS codes, as per OSI definition, allow infinite 
redistribution;
I hereby liberally put under OSS label all such codes but further clarification 
may be needed.
For now, let's focus on the codes which present no problem to build a combo 
repo from.
(whatever that definition means)
but something tells me that typically these codes can always be found somewhere, and the intel example you gave and want solved is not such a case.

one thing that is missing to automate this is that the license is not part of the .eb file. if it were, we could certainly create a public repository with what we have collected. keeping it in sync with other sites is then a datamanagement issue.


a) Have you heard of any other kind of "open source" registry project,
    perhaps something that we could ride on registering specific tarballs?
most p2p tools offer something along these lines. (but i'm still waiting for a 
fuse interface ;)

FUSE and any filesystem will likely give you one more interface with potential 
new failure modes;
but, we can shop from the idea and export the data via AFS, your p2p-FUSE code 
and so on ;-)
ie. it could be yet another interface to export the bunch of files, and that's 
easily doable.

b) What technology do you think should be deployed? What are your preferences?
    (http, ftp, git, rsync, zsync ... whatever you think should be offered)

most have master/slave models, so that might not be what you are looking for 
(granted, you can write scripts to keep things in sync).
a better solution would be distributed filesystem with one (or more) replica 
per site.
i'm aware of a project called REDDnet/L-store that aims for something similar.
i'm not sure if accessing data from a WAN shared filesystem counts as 
distributing, so that might be a way out.

Do people prefer to have an exact local complete copy or rather a "subset 
cache"? your stance?
a full namespace cache, and then 2 modes: full cache or cache on access. (our current source dir is a bit less then 200GB; so at least a few sites should be able to provide full caches)


One aspect I like about zsync or git, is that hashes make the copy mechanism 
irrelevant
(if there is an issue, you are going to catch it). With that property on, 
anything goes!
ie. why limit ourselves? every HPC site could pick/introduce the technology it 
prefers.
because of the management part of data management. both zsync and git have master/client issues, so you will need to write quite a bit of code to keep those hidden from the end-user or setup a limited number of "golden" sites.

eg hashes are fine, but how are we going to distribute the hashes in a secure way? use PKI and trust the golden sites?


But I agree there are important design criteria to fulfill and the picture is 
not 100% clear.
We will likely start with something and fix as we go.
sure.


c) Is your preference to integrate this to easybuild or, perhaps, keep it 
orthogonal?
    (eg. someone could bootstrap .local/easybuild/source with zsync & then let 
it go)
i would keep it out of EB. integration should be provided, but it is a data 
management issue, EB has enough problems to deal with as is. (but i'll be more 
then happy to test/contribute ;)

Good. Orthogonality has the advantage of being future-proof.

ps3.
Ubuntu's "LTS" lineage is a good example of a commendable vendor retention 
policy.
Somehow, not everybody around understands the universal need for LTS style 
solutions...
what do you mean? LTS is only 5 years, hardly the lifetime of any of long 
running experiment ;)

IMHO, long running experiments have eventually to face the fate of their 
obsolete hardware :-(
Here is a description of good design aspects of LTS: https://wiki.ubuntu.com/LTS
(I am not promoting LTS, RHEL has even longer life cycle; I just don't have its 
commitments).

Anyway, the point above is that typically software providers are not as serious
as OS vendors for backward compatibility, at the expense of user time.
eg. Intel, with a 3-year window, may pull the plug of past compilers/MPI stack 
etc.
If you know some contradicting story, let it be known...

thanks,
Fotis



Reply via email to