Hi Simon,

thank you very much for your comments!

> I can see man years of effort being spent on solving this problem within 
> Galaxy.  I was going to title this email "Danger, Will Robinson", but I 
> didn't want to be disrespectful.  
> I think the path being embarked upon, tool dependency packaging, tool 
> versioning, reproducibility, and long term archive of source tarballs is 
> going to lead inevitably 
> to creation of a new Linux distribution, which I guess will be called Galaxy 
> Linux.

I'm not sure it is comparable to a entire Linux distribution, its more
like an Appstore, like pypi, bioconductor or gems, and yes that is
reinvented somehow. I want to point out, that the pool of bioinformatic
applications are not so huge compared to an entire linux distribution
and that many of them exists as pre-compiled binaries, which makes
everything easier. 

> The packaging and archival you are talking about is exactly the service 
> provided by a Linux distribution.

Sorry maybe I was misleading. I only want a central storage for
binaries/tarballs where the source can not be trusted for long term.
'long term' and 'trusted' needs to be defined in such a discussion here.
I do not think we should copy python packages that are stored in pypi.
We should make it easy as possible to install them in our repository. If
you do not trust pypi, we can offer a mirror. Some goes for gems.

But what about packages that do not store different versions of
packages? We should have a central place to store them. UCSC tools for
example. Easy to install, but we need to store them somewhere.

> There's well established infrastructure to handle this, and years of 
> experience have gone into solving the problems well.

Sure, we can learn from them, or use them.

> Surely the number of Linux distributions in the world now exceeds 100, but I 
> don't see that the world will become a better place if we increase that 
> number by one more.

I'm not talking about a new linux distribution. Galaxy is running
everywhere, RHEL, OS-X, SUSE, "what ever is used in the Amazon Cloud"
and we need to run Galaxy on top of that. 

> We at AgResearch can't be alone in having to pick a Linux distribution to run 
> from the short list supported by our hardware vendor.  
> I can't see Galaxy Linux being on that list anytime soon.  So we have to make 
> Galaxy run on the particular distribution we have here.  For us that's CentOS 
> 6.

Sure, agree.

> Now, I see scary mention of platform independence as a goal for Galaxy 
> packaging, which I interpret as "will run on any Linux distribution".  
> I think that's essentially infeasible.

I hope not :)
We should define a minimal subset of dependencies a Galaxy system needs.
Python, libz, gccX.Y, libfortran and so on, that's it. That can be
understood as some kind of abstraction layer. If your distribution can
offer it, Galaxy is supported, otherwise take care of such an
abstraction layer for your system.

> All you can do is write install scripts which you hope are portable (by 
> following as many best practices as you know about), and then work patiently 
> with users on strange platforms, to adapt each install script to work on that 
> platform also.  I think this is not a good use of anyone's time.

I see your point, but as we support a minimal subset of requirements
that argument does not hold. Moreover, I do not expect that issue in the
Galaxyland. We are dealing with professional
administrators/bioinformaticians running on large clusters, not with
desktop users. I hope the set of different distributions that are really
in use are minimalistic. 

> How many Linux distributions do the Galaxy community actually care about 
> today?  The RHEL family is surely important, as is Ubuntu LTS.  Anything 
> else?  

Maybe a few Solaris Systems and do not forget OS-X.

> I'd be quite interested to understand this, as it provides a context for the 
> discussion, and ensures we're not just solving a hypothetical problem.
> 
> I'm just starting work on a native packaging infrastructure for Galaxy, that 
> will enable tool dependencies to use defined versions of natively installed 
> packages.  
> That frees me up to make my packages work nicely on the RHEL family.  It 
> looks like the RPMs themselves (including SRPMs obviously) will be hosted by 
> the CentOS project before too long.
> Once they're there, they can easily be archived forever.  Anyone else on that 
> platform is welcome to use the same infrastructure.  
> Then, all we really need is someone to handle the packaging effort for the 
> other major Linux distributions (a small number, I hope), and the problem is 
> essentially solved.

Sure, but the problem is not solved, or? It's just transferred from your
Linux metaphor to 'packaging formats'. How many different packaging
formats we have ... do we need a new one ...   

> Getting the Bio-Linux team interested in multi-version packaging would be a 
> great next step.

I really think that is the important part! We need to convince and
cooperate to make a truly multi-versioning packaging system. We should
have a look at Homebrew, sandboxed applications [1] and so on. But I
think James, Greg and Co. have done that and we should now make it
possible to have finally an reproducible bioinformatic workbench.

> I'll be posting here when I have progress to report on my native packaging 
> effort.

That would be great. I really appreciate your thoughts and I know I
might be a little bit to optimistic and idealistic.

Thanks,
Bjoern

[1]
http://www.superlectures.com/guadec2013/sandboxed-applications-for-gnome

> cheers,
> Simon
> 
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to