On Tue, Sep 17, 2013 at 10:05 AM, Bjoern Gruening
<bjoern.gruen...@gmail.com> wrote:
> Hi,
>
> I want to start a discussion about the storage of tarballs to guarantee
> the availability to some degree. Currently, I store most of my tarballs
> in my github account, if I do not trust the official ftp/http server.
> Anyone has the same problems/concerns? Is there an official
> guideline/recommendation?
>
> James raised that topic in one thread ... and I really want to see that
> happen, to some extend:
>
> http://dev.list.galaxyproject.org/tool-dependencies-xml-format-tp4661410p4661415.html
>
>
> I know it is an ambiguous task, but if Galaxy will be an reproducible
> system we need to think about that issue, discuss it and make a clear
> statement how far we want to go, what is feasible and what is not.
> Downstream it has many implications, already now for a few IUC members.
> For example it is hard to tell tool developers about reproducible
> tool_dependencies if no clear statement is ever made.
>
> A few problems that I encountered during tool development:
>
> - 'no stable links': tarballs on a 'lab'-website will change there links
> or delete old versions of tarballs

On the timescale of years I've seen that happen. Just recently
for example the NumPy team removed the files for some beta
and release candidates from their SourceForge download page:
http://mail.scipy.org/pipermail/numpy-discussion/2013-September/067690.html

> - github: I'm really not sure if the 'raw' API I use for fetching single
> files or tarballs from my github account is stable and will remain. I
> also think I can not put in GB of tarballs in my github account, but
> currently its the best option I have

A related issues with pointing at specific GitHub (or BitBucket)
commits is that sometimes a project rewrites their repository
(although to be clear is this bad practice and should be rare).

> - Sometimes you need to apply patches, these need to be stored
> somewhere.

See my reply below about using the current Tool Shed system.

> - If I store arbitrary tarballs in my github account and the
> installation routine in Galaxy, the user of my tools need a  huge level
> of trust in my work. Moreover, the IUC can hardly control that (md5
> checksums, next to each tarball?)
>
>
> In my opinion we need a central storage, where we can put our tarballs
> and so one. (mirrored ...)
>
> Some ideas:
>
> Two separated tool shed areas for one account:
> 1. version controlled
> 2. non-version controlled for tarballs and redirection files/rules, to
> redirect old links, maybe even redirect old repositories to new ones
> (assuming the history is the same and so on?)
>
>
> FTP Server with a few limitations, like file-size and authentication to
> make illegal file sharings harder.
>
>
> Ask the github guys if they are willing to support us?
>
>
> Build on top of Open Data initiatives, like the Open Data Portal in
> Swiss: http://www.bar.admin.ch/themen/01648/?lang=en
>
>
> Any comments, ideas?
> Cheers,
> Bjoern

I see some parallels with the Galaxy egg cache, and also other
data files which the Galaxy team are also hosting. These are all
centrally managed by the Galaxy team which is a bottleneck.

Patches and even smaller 3rd party tarballs could easily be included
in the Tool Shed repository, except for the current restriction that a
Tool dependency definition may currently only hold a single
tool_dependencies.xml file.

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to