On Aug 26, 2013, at 11:59 AM, James Taylor wrote:
> On Mon, Aug 26, 2013 at 11:48 AM, John Chilton <chil...@msi.umn.edu> wrote:
>> I think it is interesting that there was push back on providing
>> infrastructure (tool actions) for obtaining CBL from github and
>> performing installs based on it because it was not in the tool shed
>> and therefore less reproducible, but the team believes infrastructure
>> should be put in place to support pypi.
> Well, first, I'm not sure what "the team" believes, I'm stating what I
> believe and engaging in a discussion with "the community". At some
> point this should evolve into what we are actually going to do and be
> codified in a spec as a Trello card, which is even then not set in
> Second, I'm not suggesting we depend on PyPI. The nice thing about the
> second format I proposed on galaxy-dev is that we can easily parse out
> the URL and archive that file. Then someday we could provide a
> fallback repository where if the PyPI URL no longer works we still
> have it stored.
I concur here, the experience and lessons learned by long-established package
and dependency managers can provide some useful guidance for us going forward.
APT has long relied on a model of archiving upstream source (as well as
distro-generated binary (dpkg) packages), cataloging changes as a set of
patches, and maintaining an understanding of installed files, even those meant
to be user-edited. I think there is a strong advantage for us doing this as
>> I think we all value reproduciblity here, but we make different
>> calculations on what is reproducible. I think in terms of implementing
>> the ideas James has laid out or similar things I have proposed, it
>> might be beneficial to have some final answers on what external
>> resources are allowed - both for obtaining a Galaxy IUC gold star and
>> for the tool shed providing infrastructure to support their usage.
> My focus is ensuring that we can archive things that pass through the
> toolshed. Tarballs from *anywhere* are easy enough to deal with.
> External version control repositories are a bit more challenging,
> especially when you are pulling just a particular file out, so that's
> where things got a little hinky for me.
> Since we don't have the archival mechanism in place yet anyway, this
> is more a philosophical discussion and setting the right precedent.
> And yes, keeping an archive of all the software in the world is a
> scary prospect, though compared to the amount of data we currently
> keep for people it is a blip. And I'm not sure how else we can really
> achieve the level of reproducibility we desire.
One additional step that will assist with long-term archival is generating
static metadata and allowing the packaging and dependency systems to work
outside of the Galaxy and Tool Shed applications. A package metadata catalog
and package format that provided descriptions of packages on a generic
webserver and installable without a running Galaxy instance are components that
I believe are fairly important.
As for user-edited files, the env.sh files, which are generated at install-time
and then essentially untracked afterward scare me a bit. I think it'd be
useful for the packaging system have a tighter concept of environment
These are just my opinions, of course, and are going to be very APT/dpkg-biased
simply due to my experience with and favor for Debian-based distros and
dependency/package management, but I think there are useful concepts in this
(and other systems) that we can draw from.
Along those lines, one more idea I had thrown out a while ago was coming up
with a way to incorporate (or at least automatically process so that we can
convert to our format) the build definitions for other systems like MacPorts,
BSD ports/pkgsrc, dpkg, rpm, etc. so that we can leverage the existing rules
for building across our target platforms that have already been worked out by
other package maintainers with more time. I think this aligns pretty well with
Brad's thinking with CloudBioLinux, the difference in implementation being that
we require multiple installable versions and platform independence.
I am a bit worried that as we go down the "repackage (almost) all dependencies"
path (which I do think is the right path), we also run the risk of most of our
packages being out of date. That's almost a guaranteed outcome when even the
huge packaging projects (Debian, Ubuntu, etc.) are rife with out-of-date
packages. So being able to incorporate upstream build definitions may help us
package dependencies quickly.
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> To search Galaxy mailing lists use the unified search at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: