For my fluid dynamics simulations, I've been using git annex with cloudstor ( https://www.aarnet.edu.au/network-and-services/cloud-services-applications/cloudstor, I've used the webdav interface, as that was easier than requesting S3 access to the files), which can be though of as an Australia-wide owncloud. The outputs I produce are somewhere between a few 100s of megabytes and and few 10s of gigabytes (size variation is due to length of time run and how much debugging information is stored), stored in HDF5 files. I've found that when dealing with laptops which may not have large amounts of space for storing data, being able to only grab the files I need, and have the rest stored on cloudstor (and desktops and NASs, as you can instruct git annex to ensure there are at least N copies of files stored).
One thing you do want to ensure is that backups are being made of any data you produce, I've had friends lose terabytes of simulation runs because there was confusion over who was responsible for backups. James On 24 July 2018 at 08:14, Bruce Becker via discuss < [email protected]> wrote: > Hi all! > > There are some good ideas here. My personal experience leads me to believe > that there are many options for communities, but it really depends on what > they try to optimise for. > > I would start with CVMFS - this is a highly efficient way of delivering > data, and has very good versioning capabilities. You do however need some > infrastructure to use it - but you could also host one yourself as a > project. > > The promise of versioning for data in containers died prematurely with the > halt of Flocker https://github.com/ClusterHQ/flocker - This had some > great promise, but for "reasons" the project died. > > Pachyderm comes in close in this space I think - > https://www.pachyderm.io/- but it's a domain-specific tool. I don't know > if it can be re-used for other purposes, I'd love to see someone try. > > Finally, I've had some fun with https://data.world/ over the last few > months. I've heard some very good things about Azure's Machine Learning > Dashboard (I think it's called that?) which has some good versioning > functionality as well. > > All in all, this is a really good thing to be discussion. Ideally, > researchers should have infrastructure available to them to manage the > versioning of their data. For those who aren't physicists (who have all the > money and hence all the nice things), there is EUDAT which provides a > handle service to research data. One of EGI's data offerings > https://datahub.egi.eu will soon be able to assign and manage PIDs > assocated with research data too. > > You can play around with these at the moment - just order them from the > EGI or EOSC catalogue - marketplace.egi.eu or marketplace.eosc-hub.eu > > Cheers! > Bruce > > > On Mon, 23 Jul 2018 at 23:28, Waldman, Simon <[email protected]> wrote: > >> If they’re not changing very often, you could just use Git for that 😊 >> >> >> >> *From:* [email protected] <[email protected]> *On Behalf Of *Terri Yu >> *Sent:* 23 July 2018 21:32 >> *To:* discuss <[email protected]> >> *Subject:* Re: [discuss] Version control and collaboration with large >> datasets. >> >> >> >> I just realized that I don't really have many large files. >> >> >> >> I'm only using Git LFS on about 50 MB worth of files, and most of them >> are about 1 MB in size except for one 29 MB file. I don't know if Git LFS >> is the best option for my use case, but I was thinking ahead to when I >> might have more of those ~30 MB json data files. >> >> ------------------------------ >> >> *Heriot-Watt University is The Times & The Sunday Times International >> University of the Year 2018* >> >> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With >> campuses and students across the entire globe we span the world, delivering >> innovation and educational excellence in business, engineering, design and >> the physical, social and life sciences. >> >> This email is generated from the Heriot-Watt University Group, which >> includes: >> >> 1. Heriot-Watt University, a Scottish charity registered under number >> SC000278 >> 2. Edinburgh Business School a Charity Registered in Scotland, >> SC026900. Edinburgh Business School is a company limited by guarantee, >> registered in Scotland with registered number SC173556 and registered >> office at Heriot-Watt University Finance Office, Riccarton, Currie, >> Midlothian, EH14 4AS >> 3. Heriot- Watt Services Limited (Oriam), Scotland's national >> performance centre for sport. Heriot-Watt Services Limited is a private >> limited company registered is Scotland with registered number SC271030 and >> registered office at Research & Enterprise Services Heriot-Watt >> University, >> Riccarton, Edinburgh, EH14 4AS. >> >> The contents (including any attachments) are confidential. If you are not >> the intended recipient of this e-mail, any disclosure, copying, >> distribution or use of its contents is strictly prohibited, and you should >> please notify the sender immediately and then delete it (including any >> attachments) from your system. >> >> *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss / > see discussions <https://carpentries.topicbox.com/groups/discuss> + > participants <https://carpentries.topicbox.com/groups/discuss/members> + > delivery > options <https://carpentries.topicbox.com/groups/discuss/subscription> > Permalink > <https://carpentries.topicbox.com/groups/discuss/Tb776978a905c0bf8-M579b5f99185ab6b5282f7e45> > -- Don't send me files in proprietary formats (.doc(x), .xls, .ppt etc.). It isn't good enough for Tim Berners-Lee <http://opendotdotdot.blogspot.com/2010/04/rms-and-tim-berners-lee-separated-at.html>, and it isn't good enough for me either. For more information visit http://www.gnu.org/philosophy/no-word-attachments.html. Truly great madness cannot be achieved without significant intelligence. - Henrik Tikkanen If you're not messing with your sanity, you're not having fun. - James Tocknell In theory, there is no difference between theory and practice; In practice, there is. ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/Tb776978a905c0bf8-M106181f16a0ef0dcaa02da41 Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription
