Hi Andrew.

Thanks for your reply.

I'll probably test this myself, but would modifying and committing a 4GB text 
file actually add 4GB to the repository's size? I anticipate that it won't, 
since Git keeps track of the changes only, instead of storing a copy of the 
whole file (whereas this is not the case with binary files, hence the need for 
LFS).

Kind regards,
Farshid

> On 24 Jul 2017, at 12:29 pm, Andrew Ardill <andrew.ard...@gmail.com> wrote:
> 
> Hi Farshid,
> 
> On 24 July 2017 at 12:01, Farshid Zavareh <fhzava...@gmail.com> wrote:
>> I'v been handed over a project that uses Git LFS for storing large CSV files.
>> 
>> My understanding is that the main benefit of using Git LFS is to keep the 
>> repository small for binary files, where Git can't keep track of the changes 
>> and ends up storing whole files for each revision. For a text file, that 
>> problem does not exist to begin with and Git can store only the changes. At 
>> the same time, this is going to make checkouts unnecessarily slow, not to 
>> mention the financial cost of storing the whole file for each revision.
>> 
>> Is there something I'm missing here?
> 
> Git LFS gives benefits when working on *large* files, not just large
> *binary* files.
> 
> I can imagine a few reasons for using LFS for some CSV files
> (especially the kinds of files I deal with sometimes!).
> 
> The main one is that many users don't need or want to download the
> large files, or all versions of the large file. Moreover, you probably
> don't care about changes between those files, or there would be so
> many that using the git machinery for comparing them would be
> cumbersome and ineffective.
> 
> For me, if I was storing any CSV file over a couple of hundred
> megabyte I would consider using something like LFS. An example would
> be a large Dunn & Bradstreet data file, which I do an analysis on
> every quarter. I want to include the file in the repository, so that
> the analysis can be replicated later on, but I don't want to add 4GB
> of data to the repo every single time the dataset gets updated (also
> every quarter). Storing that in LFS would be a good solution then.
> 
> Regards,
> 
> Andrew Ardill

Reply via email to