[git-users] Size of cloned git subtrees - only history / files for subtree needed

Haasip Satang Wed, 29 Aug 2012 17:21:19 -0700

Hi all, 

in short the question of the lenghty explanation below will be: How can I 
create a clone of a subtree that only contains the data needed for that 
subtree in the .git folder.


In detail here is what I have tried already and what my setup looks like: 
We are having a big repository containing multiple projects (political 
reasons, cannot avoid having that... at least for now). While this works 
fine for all the developers (they just clone the big repo and get all the 
projects they need), we are facing problems with our continuous build 
system (Jenkins). 

Here we would like to have a job for each single project; of course WITHOUT 
having to clone the whole big repo for every job, as this would lead to a 
significant overhead on disk. 

After searching around for some time I basically came across four potential 
solutions: 

1. Sparse Checkout
2. Submodules
3. Individual Repos with a manager script like repo, mr, git-status, and 
all the others that exist to tackle that problem
4. Subtrees

The problem with 1 is, you still get to clone the whole repo (including all 
history), only to then checkout a part of it --> still disk overhead. 
As for submodules, I personally don't really like them and don't think the 
should be used in this case and they are kinda difficult to handle and can 
be fragile anyway. 
The additional script based solution seems kinda hacky as well, so I didn't 
really follow up on that too much. 

So my favorite solution so far is actually using git subtree, which is more 
or less easy (especially since the subtree branches are only used for the 
CI builds / in a read only way, nothing needs to be pushed back to the 
bigrepo). 

The problem is, however, when I clone the bare and then create the subtree 
branches in the cloned working copy and then try to clone these subtree 
branches only, I still seem to get the whole big history, including all the 
stuff outside the tree. 

Is there any way to avoid that and create a synthetic project history 
containing only data relevant for the subtree? 

What I did to kinda get there is more a hacky way. I create the subtree 
branch using: 

git subtree split --prefix=xyz --annotate="[xy] " --rejoin -b subtrees/xyz

Then I clone that with: 

git clone --depth 1 --no-hardlinks file:///home/me/gitTests/subtreeRepo -b 
subtrees/xyz xyz

So creating a shallow clone (depth 1) seems to be the only way and that 
also only works on the local linux machine. If I clone the same subtreeRepo 
branch on a remote machine I actually get the whole big pack / history with 
it (which I of course don't want). 

So what I did is I cloned the subtree branch locally and then cloned that 
repo from my remote Jenkins machine. While this seems to work (I haven't 
looked in if I'm getting the necessary change sets to send out the emails 
yet) it seems both, unnecessary complicated and very hacky. 

To sum up, let me conclude with the question from the beginning: How can I 
create a clone of a subtree that only contains the data needed for that 
subtree in the .git folder. 

Looking forward to your comments and ideas :)

Thanks, Haasip






-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/git-users/-/n5ZPYpDf4EIJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/git-users?hl=en.

[git-users] Size of cloned git subtrees - only history / files for subtree needed

Reply via email to