Thank you for re-raising DVC. I hadn't looked seriously at it, but it seems a nice balance between git LFS and git annex. I'm recommending that our team look at it for potential inclusion in Gigantum.
While I'm at it, I think now is also a good time to pitch the graphical tool we're building: https://gigantum.com I'm a little hesitant, but I've checked in-person with a few deeply invested folks in the community and it seems reasonable to "advertise" this on the list - our client is open source (and always will be) and we are hoping we can build a sustainable model where scientists and educators can use the thing for free (much like GitHub has done - though I understand not everyone loves GitHub either). We are still in beta, but we've put together something that we believe does a good job of managing git and docker, along with a cloud synchronization back-end. While this may rankle more experienced coders, I think it can potentially empower folks who won't (or can't) learn the whole set of data / software management skills. Part of being in Beta is that the feature set is still somewhat open and I'd truly love to get input from other carpentry instructors on this. A concrete idea I have is that you could save 1/4 of the time if you don't need to teach command line git, and then perhaps you could get more students to the point that they wrote good functions. Triggered by this thread... data management piece is particularly of interest. Currently, we have a relatively coarse set of options - each project is scoped to a docker-bind-mounted folder on your host OS. There are "input" and "output" directories there managed via git LFS by default. However, you can also disable tracking of these directories and then manage them however you like (you shouldn't currently use a strategy that extends the existing git repo like doing LFS yourself - though now that I think of it, manual git annex MIGHT work... I'll have to check when I get some time). One of the major hobgoblins is the windows filesystem, of course... and we could potentially eliminate that by shifting to docker volumes instead of bind-mounts (but then you lose Host OS access). Kicking the tires is super easy via the demo server link, and all you need to use it locally is download the electron GUI or alternatively install a pip package. It would be great to get input on what would be valuable and if folks would be interested in talking about using this in workshops (or just developing resources around transparent and open science strategies and tools), please let me know! I will of course support anyone who is interested in working with Gigantum and would love to run some workshops in partnership with some other folks (we're currently working on an open/reproducible neuro workshop with folks at Stanford and Columbia - so if anyone is interested in that specifically, please let me know soon!). Best, Dav ps - To be clear, it's super-easy to walk away from Gigantum. Everything is in a git repo, and the Dockerfile is usable (with a bit of work - which I'd be happy to walk people through) outside of the platform. On Thu, Aug 2, 2018 at 11:10 AM <[email protected]> wrote: > > Since this thread was highlighted in yesterday's Carpentry Clippings, I'll > bet I'm not the last to jump in today, so I'll be brief. > > DVC was mentioned at the beginning, but I gather few here have given it a > try. I encourage you to take a look. The tool is still in alpha, but > developing quickly with a lot of potential. What I like about DVC: > > Works in parallel to git and is similar to git LFS in cloning/pushing/pulling > references to data files > Data files are not tracked by git; your code repository remains just that > Supports external data sources (since 0.10.0); do you really want a copy of > your data *within* every repo that reads it? > Supports multiple cloud data sources (e.g. Amazon S3) > Does not default to "publishing" data on GitHub. GitHub is no Dataverse or > Figshare (... data discoverability, yada yada) > It's a makefile alternative too! > > > The Carpentries / discuss / see discussions + participants + delivery options > Permalink ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/Tb776978a905c0bf8-M6ec28f85bc59ba4ad6e66d6b Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription
