Hi Andrew. Thanks for your reply.
I'll probably test this myself, but would modifying and committing a 4GB text file actually add 4GB to the repository's size? I anticipate that it won't, since Git keeps track of the changes only, instead of storing a copy of the whole file (whereas this is not the case with binary files, hence the need for LFS). Kind regards, Farshid > On 24 Jul 2017, at 12:29 pm, Andrew Ardill <andrew.ard...@gmail.com> wrote: > > Hi Farshid, > > On 24 July 2017 at 12:01, Farshid Zavareh <fhzava...@gmail.com> wrote: >> I'v been handed over a project that uses Git LFS for storing large CSV files. >> >> My understanding is that the main benefit of using Git LFS is to keep the >> repository small for binary files, where Git can't keep track of the changes >> and ends up storing whole files for each revision. For a text file, that >> problem does not exist to begin with and Git can store only the changes. At >> the same time, this is going to make checkouts unnecessarily slow, not to >> mention the financial cost of storing the whole file for each revision. >> >> Is there something I'm missing here? > > Git LFS gives benefits when working on *large* files, not just large > *binary* files. > > I can imagine a few reasons for using LFS for some CSV files > (especially the kinds of files I deal with sometimes!). > > The main one is that many users don't need or want to download the > large files, or all versions of the large file. Moreover, you probably > don't care about changes between those files, or there would be so > many that using the git machinery for comparing them would be > cumbersome and ineffective. > > For me, if I was storing any CSV file over a couple of hundred > megabyte I would consider using something like LFS. An example would > be a large Dunn & Bradstreet data file, which I do an analysis on > every quarter. I want to include the file in the repository, so that > the analysis can be replicated later on, but I don't want to add 4GB > of data to the repo every single time the dataset gets updated (also > every quarter). Storing that in LFS would be a good solution then. > > Regards, > > Andrew Ardill