Fwd: [git-users] How to use git to store large files without keeping track of versions?

Peng Yu Tue, 24 Feb 2015 10:17:02 -0800

I forget to send to the mailing list.


---------- Forwarded message ----------
From: Peng Yu <pengyu...@gmail.com>
Date: Tue, Feb 24, 2015 at 12:05 PM
Subject: Re: [git-users] How to use git to store large files without
keeping track of versions?
To: Konstantin Khomoutov <flatw...@users.sourceforge.net>


On Tue, Feb 24, 2015 at 7:45 AM, Konstantin Khomoutov
<flatw...@users.sourceforge.net> wrote:
> On Sun, 22 Feb 2015 07:51:59 -0800 (PST)
> pengyu...@gmail.com wrote:
>
>> I have some data files that need to be stored along with source code.
>> These data files are large, but I don't need to keep their versions.
>> I only need to keep the versions of the source code.
>>
>> git-annex is mainly for large files with versioning. Therefore, it is
>> not suitable for my situation.
>>
>> Does anybody know whether there is a way to use git to manage source
>> code (with versioning) as well data files (without versioning)?
>
> It's a bit unclear what you are really asking for.
>
> Are you fine with keeping those files checked into your repository and
> are just afraid each commit will somehow include those file into the
> repository again and again (many guides state that with each commit Git
> stores snapshots, not deltas)?  If so, then fear not: while Git indeed
> stores snapshots, the content which did not change will not be
> somehow included again -- the new commit will reference existing
> objects representing those big files.  So, just add and commit these
> files once and just make sure you don't change them and commit these
> changes.

In my git repository, there are data files as well as source files.
For source files, they can be stored with the regular git mechanisms.
The problem is that data files (many of them, each one many MBs). They
stored in gz files, they can not be effectively diffed as the content
of the data can be completely changed. Also, old data files can not be
permanently deleted from the .git directory, even they are not
available in the most recent commit. This is can be unacceptable
situation, as the git reposiotry will quickly grow to a size that is
slow to use.

Is it clear what my scenario is?

> If this is not what you want, I can think of two more possibilities:

I am not referring to the following possibilities.

> * You want these files in the repo but don't want them to be checked
>   out by default.
>
> * You do not want these files even in the repo.
>
> Which one do you want?
>
> The first case is made possible by a simple fact Git is able to store
> any object in its database -- when you do this, the contents of the
> object, named "blob" in Git's parlance, will persist in the repository
> as long as there's something referencing it (let's not dig into
> nitty-gritty details of this for now).  So a common idiom is to put an
> object into the repository and then make a tag (usually annotated)
> pointing to it:
>
>   $ git tag -a my-big-file $(git hash-object -w my-big-file)
>
> The `git hash-object -w` command will read the specified file, put it
> into the repo and print the SHA-1 hash calculated over its contents.
> The `git tag -a` command then creates a tag (named "my-big-file")
> pointing to that hash.
>
> The upside of this approach is that it's simple and elegant: you have
> the file in the repo, and when you want its contents, you simply
> extract it to produce whatever file you want:
>
>   $ git cat-file my-big-file^{} >/some/path/to/my-big-file
>
> The downsides:
> * The data is in the repo.
> * If you have lots of files, "getting them out" of the repository
>   is tedious or requires scripting.
>
> The second case is the most vague: if you want certain Git files to be
> kept out of the repo, then Git is not the tool to manage them.
> Not directly at least.  Basically, commit a file into your repo which
> contains explanations of how to make these files available at the place
> where your repo is checked out.
>
> If you still want these files managed by Git, put them all in a
> separate repo, and add it as a submodule to the main repository.
> If you will so wish, you'll be able to check out the main repo without
> the submodules so the big files will still technicaly be out of it
> while still kept in Git.
>
> git-annex is an interesting possibility as well but I'm afraid it's a
> standalone tool, that is, it does not really integrate into "regular"
> Git workflow (like Git sumbodules do).  But I'm not familiar with this
> tool so can't really comment on this.



--
Regards,
Peng


-- 
Regards,
Peng

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fwd: [git-users] How to use git to store large files without keeping track of versions?

Reply via email to