Since this directly applies, I saw this note in my rss reader (newsblur) today: https://blog.github.com/2018-07-30-git-lfs-2.5.0-now-available/
Specifically their git lfs migrate import --fixup which allows for dealing with that dreaded "You cannot push to (whatever) because your repository is too big." Also, because this isn't advertised anywhere, if you authenticate with education.github.com, you can have unlimited private repos and space. (And, since it trips me up every time, remember that after "installing" git-lfs from package cloud on linux, you still have to apt install git-lfs, since package cloud only adds the repo) ________________________________ From: Jon Pipitone <[email protected]> Sent: Thursday, 26 July 2018 1:19:45 AM To: discuss Subject: Re: [discuss] Version control and collaboration with large datasets. >My five cents is, that it really depends on the characteristics of your >data (e.g. size) and the goal you try to achieve by versioning your >data. +1 to thinking carefully about what your goals are here before jumping to any particular tool. My experience: I found myself re-organizing all my lab's neuroimaging data starting from data collected when it was a single grad student up to when it was housing data from multiple studies and multiple sites of data collection. We opted to begin by first organizing the data with sensible naming scheme on a shared drive, as Lars describes, because it was immediately accessible to everyone in the lab regardless of their tech know-how, and was also a necessary starting point regardless of whether we later adopted a fancer data versioning/sharing technology. We did later use a neuroimaging-specific system for sharing our data with others, but retained the filesystem organization in addition because it was familiar, and so darn convenient for scripting, documentation, etc. Jon. On 07/23/18, [email protected] wrote: > Hi, > > My five cents is, that it really depends on the characteristics of your data > (e.g. size) and the goal you try to achieve by versioning your data. > > Examples: > > Size: If e.g. the datasets are "small", they can easily be handled by git. > For larger datasets, it depends on what is important to you. E.g. a shared > network file system with proper backup and well-defined naming scheme can be > totally fine in some cases, while a proper data repository issuing DOIs or > similar is needed in other cases. If synchronization speed, as well as > optimized storage, is important, something like dat or IPFS is advisable. > > Purpose: Similarly, if your goal is to share data with collaborators, then a > simple HTTPS link is the easiest (hosted on e.g. GitHub, AWS, or a data > repository). > > Cheers, > Lars ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/Tb776978a905c0bf8-M5489167c4c6220100f4abc5a<https://protect-au.mimecast.com/s/SmSZC81Vq2C6DNXBI2ZUw7?domain=carpentries.topicbox.com> Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription<https://protect-au.mimecast.com/s/lwAIC91W8rCkL4zrhO4e1D?domain=carpentries.topicbox.com> ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/Tb776978a905c0bf8-M37515a5553a4c80373ac40d0 Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription
