Hi all!

There are some good ideas here. My personal experience leads me to believe
that there are many options for communities, but it really depends on what
they try to optimise for.

I would start with CVMFS - this is a highly efficient way of delivering
data, and has very good versioning capabilities. You do however need some
infrastructure to use it  - but you could also host one yourself as a
project.

The promise of versioning for data in containers died prematurely with the
halt of Flocker https://github.com/ClusterHQ/flocker - This had some great
promise, but for "reasons" the project died.

Pachyderm comes in close in this space I think - https://www.pachyderm.io/-
but it's a domain-specific tool. I don't know if it can be re-used for
other purposes, I'd love to see someone try.

Finally, I've had some fun with https://data.world/ over the last few
months. I've heard some very good things about Azure's Machine Learning
Dashboard (I think it's called that?) which has some good versioning
functionality as well.

All in all, this is a really good thing to be discussion. Ideally,
researchers should have infrastructure available to them to manage the
versioning of their data. For those who aren't physicists (who have all the
money and hence all the nice things), there is EUDAT which provides a
handle service to research data. One of EGI's data offerings
https://datahub.egi.eu will soon be able to assign and manage PIDs
assocated with research data too.

You can play around with these at the moment - just order them from the EGI
or EOSC catalogue - marketplace.egi.eu or marketplace.eosc-hub.eu

Cheers!
Bruce


On Mon, 23 Jul 2018 at 23:28, Waldman, Simon <[email protected]> wrote:

> If they’re not changing very often, you could just use Git for that 😊
>
>
>
> *From:* [email protected] <[email protected]> *On Behalf Of *Terri Yu
> *Sent:* 23 July 2018 21:32
> *To:* discuss <[email protected]>
> *Subject:* Re: [discuss] Version control and collaboration with large
> datasets.
>
>
>
> I just realized that I don't really have many large files.
>
>
>
> I'm only using Git LFS on about 50 MB worth of files, and most of them are
> about 1 MB in size except for one 29 MB file. I don't know if Git LFS is
> the best option for my use case, but I was thinking ahead to when I might
> have more of those ~30 MB json data files.
> ------------------------------
>
> *Heriot-Watt University is The Times & The Sunday Times International
> University of the Year 2018*
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> the physical, social and life sciences.
>
> This email is generated from the Heriot-Watt University Group, which
> includes:
>
>    1. Heriot-Watt University, a Scottish charity registered under number
>    SC000278
>    2. Edinburgh Business School a Charity Registered in Scotland,
>    SC026900. Edinburgh Business School is a company limited by guarantee,
>    registered in Scotland with registered number SC173556 and registered
>    office at Heriot-Watt University Finance Office, Riccarton, Currie,
>    Midlothian, EH14 4AS
>    3. Heriot- Watt Services Limited (Oriam), Scotland's national
>    performance centre for sport. Heriot-Watt Services Limited is a private
>    limited company registered is Scotland with registered number SC271030 and
>    registered office at Research & Enterprise Services Heriot-Watt University,
>    Riccarton, Edinburgh, EH14 4AS.
>
> The contents (including any attachments) are confidential. If you are not
> the intended recipient of this e-mail, any disclosure, copying,
> distribution or use of its contents is strictly prohibited, and you should
> please notify the sender immediately and then delete it (including any
> attachments) from your system.
> *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss /
> see discussions <https://carpentries.topicbox.com/groups/discuss> +
> participants <https://carpentries.topicbox.com/groups/discuss/members> + 
> delivery
> options <https://carpentries.topicbox.com/groups/discuss/subscription>
> Permalink
> <https://carpentries.topicbox.com/groups/discuss/Tb776978a905c0bf8-M19c77a6c6d95c44f9b2eb888>
>

------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Tb776978a905c0bf8-M579b5f99185ab6b5282f7e45
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

Reply via email to