On Friday, 1 January 2016, Björn Grüning <bjoern.gruen...@gmail.com> wrote:

> Hi Galaxy developers,
> this is a RFC to get the implementation details right for a new action
> type in `tool_dependencies.xml`.
> Since years we try to save a very crucial sustainability problem:
>   **Non-sustainable links**!
> A little bit of history
> ------------------------
> At first we tried to [mirror
> tarballs](https://github.com/bgruening/download_store) with sceptical
> sustainability,
> like BioC or random FTP servers.
> But over time we encountered many more places which we can not trust.
> Google-Code, SourceForge etc ...
> We tried to mirror the entire BioC history by tracking the SVN history
> down and creating tarball for every revision ...  a Herculean task ...
> but still limited in scope because there are so many other things that
> needs to be archived to make Galaxy and all tools sustainable.
> In the end we ended up with the simplest solution, provide a community
> archive where everyone can drop tarballs that they want to be
> sustainable. The Galaxy Project was so generous and is funding the
> storage but we have plans to mirror and distribute the workload to
> universities and other institutes that want to help.
> The biggest problem we needed to solve was the access to the archive.
> Who can drop tarballs? How do we control access to prevent abuse of this
> system?
> We went ahead and the created the Cargo-Port:
>     https://github.com/galaxyproject/cargo-port
> Access will be controlled by a community and via PR. Add your package
> and we will check the content (hopefully) automatically and the tarball
> will be mirrored to a storage server.
> ---
> So far so good. This RFC is about the usage of Cargo-Port inside of
> Galaxy. I would like to propose a new action type that uses the
> Cargo-Port directly. It should replace `<action type="download_by_url"
> sha256sum="6387238383883...">` and `<action type="download_file">` and
> offer a more transparent and user-friendly solution.
> The current state of the art is quite cumbersome since we need to
> generate manually the checksum, offer the correct link
> and get the same information into Cargo-Port. I would like to streamline
> this a little bit and use this as a good opportunity
> to fix and work on https://github.com/galaxyproject/galaxy/issues/896.
> Proposal `<action type="download_by_proxy">`:
>  * attribute for Id, Version, Platform, Architecture
>  * no URL, no checksum
>  * attribute for the URL to cargo-port/urls.tsv
>    * default to the current github repo
>    * configurable via galaxy.ini
>  * this action will more or less trigger this curl command: `$ curl
> https://raw.githubusercontent.com/galaxyproject/cargo-port/master/gsl.py
> | python - --package_id augustus_3_1`
>    * which give us the freedom to change API, columns ... in Cargo-Port
> without updating Galaxy core
>    * the only API that need to keep stable is `gsl`
>  * `gsl` will try to download from the original URL, specified in
> Cargo-Port. If this does not work we will download our archived one.
>  * Changing the current working dir? Is this what we want, e.g.
> automatically uncompress and change cwd like `download_by_url`.
>    * We will need an attribute to not uncompress. A few tools need the
> tarballs uncompressed.
> Single Point of Failure - a small remark
> ----------------------------------------
> Previously, Galaxy packages relied entirely on the kindness of upstream
> to maintain existing packages indefinitely. Obviously not a sustainable
> practice. Every time a tarball was moved, we had to hope one of us
> retained a copy so that we could ensure reproducibility. With the advent
> of the Cargo Port, we now maintain a complete, redundant copy of every
> upstream tarball used in IUC and devteam repositories, additionally
> adding sha256sums for every file to ensure download integrity. The
> community is welcome to request that files they use in their packages be
> added as well. We believe this will help combat the single point of
> failure by providing at least one level of duplication. The Cargo Port
> is considering plans to provide mirrors of itself to various
> universities and another layer of redundancy.
> Thanks for reading and we appreciate any comments.
> Eric, Nitesh & Bjoern
> -- https://gist.github.com/bgruening/48297c27cd72cbadea7a
Maybe a question for Nitesh,

Would this replace or coexist with related but narrower in scope
Bioarchive project?


Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

To search Galaxy mailing lists use the unified search at:

Reply via email to