Re: [git-users] How to use git to store large files without keeping track of versions?

Konstantin Khomoutov Tue, 24 Feb 2015 05:46:43 -0800

On Sun, 22 Feb 2015 07:51:59 -0800 (PST)
[email protected] wrote:

> I have some data files that need to be stored along with source code.
> These data files are large, but I don't need to keep their versions.
> I only need to keep the versions of the source code.
> 
> git-annex is mainly for large files with versioning. Therefore, it is
> not suitable for my situation.
> 
> Does anybody know whether there is a way to use git to manage source
> code (with versioning) as well data files (without versioning)?


It's a bit unclear what you are really asking for.

Are you fine with keeping those files checked into your repository and
are just afraid each commit will somehow include those file into the
repository again and again (many guides state that with each commit Git
stores snapshots, not deltas)?  If so, then fear not: while Git indeed
stores snapshots, the content which did not change will not be
somehow included again -- the new commit will reference existing
objects representing those big files.  So, just add and commit these
files once and just make sure you don't change them and commit these
changes.

If this is not what you want, I can think of two more possibilities:

* You want these files in the repo but don't want them to be checked
  out by default.

* You do not want these files even in the repo.

Which one do you want?

The first case is made possible by a simple fact Git is able to store
any object in its database -- when you do this, the contents of the
object, named "blob" in Git's parlance, will persist in the repository
as long as there's something referencing it (let's not dig into
nitty-gritty details of this for now).  So a common idiom is to put an
object into the repository and then make a tag (usually annotated)
pointing to it:

  $ git tag -a my-big-file $(git hash-object -w my-big-file)

The `git hash-object -w` command will read the specified file, put it
into the repo and print the SHA-1 hash calculated over its contents.
The `git tag -a` command then creates a tag (named "my-big-file")
pointing to that hash.

The upside of this approach is that it's simple and elegant: you have
the file in the repo, and when you want its contents, you simply
extract it to produce whatever file you want:

  $ git cat-file my-big-file^{} >/some/path/to/my-big-file

The downsides:
* The data is in the repo.
* If you have lots of files, "getting them out" of the repository
  is tedious or requires scripting.

The second case is the most vague: if you want certain Git files to be
kept out of the repo, then Git is not the tool to manage them.
Not directly at least.  Basically, commit a file into your repo which
contains explanations of how to make these files available at the place
where your repo is checked out.

If you still want these files managed by Git, put them all in a
separate repo, and add it as a submodule to the main repository.
If you will so wish, you'll be able to check out the main repo without
the submodules so the big files will still technicaly be out of it
while still kept in Git.

git-annex is an interesting possibility as well but I'm afraid it's a
standalone tool, that is, it does not really integrate into "regular"
Git workflow (like Git sumbodules do).  But I'm not familiar with this
tool so can't really comment on this.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [git-users] How to use git to store large files without keeping track of versions?

Reply via email to