Hi all! There are some good ideas here. My personal experience leads me to believe that there are many options for communities, but it really depends on what they try to optimise for.
I would start with CVMFS - this is a highly efficient way of delivering data, and has very good versioning capabilities. You do however need some infrastructure to use it - but you could also host one yourself as a project. The promise of versioning for data in containers died prematurely with the halt of Flocker https://github.com/ClusterHQ/flocker - This had some great promise, but for "reasons" the project died. Pachyderm comes in close in this space I think - https://www.pachyderm.io/- but it's a domain-specific tool. I don't know if it can be re-used for other purposes, I'd love to see someone try. Finally, I've had some fun with https://data.world/ over the last few months. I've heard some very good things about Azure's Machine Learning Dashboard (I think it's called that?) which has some good versioning functionality as well. All in all, this is a really good thing to be discussion. Ideally, researchers should have infrastructure available to them to manage the versioning of their data. For those who aren't physicists (who have all the money and hence all the nice things), there is EUDAT which provides a handle service to research data. One of EGI's data offerings https://datahub.egi.eu will soon be able to assign and manage PIDs assocated with research data too. You can play around with these at the moment - just order them from the EGI or EOSC catalogue - marketplace.egi.eu or marketplace.eosc-hub.eu Cheers! Bruce On Mon, 23 Jul 2018 at 23:28, Waldman, Simon <[email protected]> wrote: > If they’re not changing very often, you could just use Git for that 😊 > > > > *From:* [email protected] <[email protected]> *On Behalf Of *Terri Yu > *Sent:* 23 July 2018 21:32 > *To:* discuss <[email protected]> > *Subject:* Re: [discuss] Version control and collaboration with large > datasets. > > > > I just realized that I don't really have many large files. > > > > I'm only using Git LFS on about 50 MB worth of files, and most of them are > about 1 MB in size except for one 29 MB file. I don't know if Git LFS is > the best option for my use case, but I was thinking ahead to when I might > have more of those ~30 MB json data files. > ------------------------------ > > *Heriot-Watt University is The Times & The Sunday Times International > University of the Year 2018* > > Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With > campuses and students across the entire globe we span the world, delivering > innovation and educational excellence in business, engineering, design and > the physical, social and life sciences. > > This email is generated from the Heriot-Watt University Group, which > includes: > > 1. Heriot-Watt University, a Scottish charity registered under number > SC000278 > 2. Edinburgh Business School a Charity Registered in Scotland, > SC026900. Edinburgh Business School is a company limited by guarantee, > registered in Scotland with registered number SC173556 and registered > office at Heriot-Watt University Finance Office, Riccarton, Currie, > Midlothian, EH14 4AS > 3. Heriot- Watt Services Limited (Oriam), Scotland's national > performance centre for sport. Heriot-Watt Services Limited is a private > limited company registered is Scotland with registered number SC271030 and > registered office at Research & Enterprise Services Heriot-Watt University, > Riccarton, Edinburgh, EH14 4AS. > > The contents (including any attachments) are confidential. If you are not > the intended recipient of this e-mail, any disclosure, copying, > distribution or use of its contents is strictly prohibited, and you should > please notify the sender immediately and then delete it (including any > attachments) from your system. > *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss / > see discussions <https://carpentries.topicbox.com/groups/discuss> + > participants <https://carpentries.topicbox.com/groups/discuss/members> + > delivery > options <https://carpentries.topicbox.com/groups/discuss/subscription> > Permalink > <https://carpentries.topicbox.com/groups/discuss/Tb776978a905c0bf8-M19c77a6c6d95c44f9b2eb888> > ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/Tb776978a905c0bf8-M579b5f99185ab6b5282f7e45 Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription
