Thanks Thomas, 

I was looking into these Jenkins options as well, but as you said yourself, 
it would be kind of a "short-term hack" and definitely would have big 
impact on our feedback loop as we have a whole bunch of projects and 
branches and of course would like to get timely feedback. (which is hard 
with only one Executor ;-) ). 

As for the hard links, please see my reply to Philips answer above. I don't 
understand why cloning locally (with file:// prefix though) leads too a 
small repository while cloning from remote results in a big one. When I 
clone the small local clone again from remotely though, it stays small and 
I end up with what I want to have... just that I need to jump across too 
many repos:

bare --> working copy to create subtrees --> local clone for only one 
subtree --> remote clone of that subtree working copy (which would that be 
used by jenkins). 

It seems like we need a way to rewrite / create a separate pack file for 
something which only contains what is necessary for the subtree branch... 
not sure if that is possible though. 

About splitting up to small repos... yes, would love to do that... the 
problem is more of political nature at the moment and we are having to 
merge sources between clear case and git all the time for now (which 
sucks!!!). So have the whole clear case view under git control as well just 
makes this already cumbersome process a lot easier for us. Don't see that I 
will be able to change that anytime soon until some people finally 
understand that it just doesn't make sense and git is definitely a good 
thing and can be used on enterprise scope projects (as all the references 
we gave are not enough ;) )... anyway... let's not go there... as I said... 
politics ;( So I hope after that git setup is running fine people will 
loose the fear and we can get rid of that strange setup we are forced to 
have now... regardless of that I would still like to know what is causing 
this clone behavior as I don't really understand what is going on...

Am Donnerstag, 30. August 2012 09:36:50 UTC+2 schrieb Thomas Ferris 
> On Thursday, August 30, 2012 2:21:15 AM UTC+2, Haasip Satang wrote:
>> Hi all, 
>> in short the question of the lenghty explanation below will be: How can 
>> I create a clone of a subtree that only contains the data needed for that 
>> subtree in the .git folder.  
>> In detail here is what I have tried already and what my setup looks like: 
>> We are having a big repository containing multiple projects (political 
>> reasons, cannot avoid having that... at least for now). While this works 
>> fine for all the developers (they just clone the big repo and get all the 
>> projects they need), we are facing problems with our continuous build 
>> system (Jenkins). 
>> Here we would like to have a job for each single project; of course 
>> WITHOUT having to clone the whole big repo for every job, as this would 
>> lead to a significant overhead on disk. 
>> After searching around for some time I basically came across four 
>> potential solutions: 
>> 1. Sparse Checkout
>> 2. Submodules
>> 3. Individual Repos with a manager script like repo, mr, git-status, and 
>> all the others that exist to tackle that problem
>> 4. Subtrees
>> The problem with 1 is, you still get to clone the whole repo (including 
>> all history), only to then checkout a part of it --> still disk overhead. 
>> As for submodules, I personally don't really like them and don't think 
>> the should be used in this case and they are kinda difficult to handle and 
>> can be fragile anyway. 
>> The additional script based solution seems kinda hacky as well, so I 
>> didn't really follow up on that too much. 
>> So my favorite solution so far is actually using git subtree, which is 
>> more or less easy (especially since the subtree branches are only used for 
>> the CI builds / in a read only way, nothing needs to be pushed back to the 
>> bigrepo). 
>> The problem is, however, when I clone the bare and then create the 
>> subtree branches in the cloned working copy and then try to clone these 
>> subtree branches only, I still seem to get the whole big history, including 
>> all the stuff outside the tree. 
>> Is there any way to avoid that and create a synthetic project history 
>> containing only data relevant for the subtree? 
>> What I did to kinda get there is more a hacky way. I create the subtree 
>> branch using: 
>> git subtree split --prefix=xyz --annotate="[xy] " --rejoin -b subtrees/xyz
>> Then I clone that with: 
>> git clone --depth 1 --no-hardlinks file:///home/me/gitTests/subtreeRepo 
>> -b subtrees/xyz xyz
>> So creating a shallow clone (depth 1) seems to be the only way and that 
>> also only works on the local linux machine. If I clone the same subtreeRepo 
>> branch on a remote machine I actually get the whole big pack / history with 
>> it (which I of course don't want). 
>> So what I did is I cloned the subtree branch locally and then cloned that 
>> repo from my remote Jenkins machine. While this seems to work (I haven't 
>> looked in if I'm getting the necessary change sets to send out the emails 
>> yet) it seems both, unnecessary complicated and very hacky. 
>> To sum up, let me conclude with the question from the beginning: How can 
>> I create a clone of a subtree that only contains the data needed for that 
>> subtree in the .git folder. 
>> Looking forward to your comments and ideas :)
>> Thanks, Haasip
> Tricky situation. We tackled it ourselves by splitting into many smaller 
> repositories, and using gitslave <> to 
> organize them.
> One short-term hack (tm) I can think of, is to have the different Jenkins 
> jobs share one workspace directory on disk:
>   Job -> Configure -> Advanced Project Options -> Use custom workspace
> Make sure that no two jobs start running in the same workspace at the same 
> time, as it can mess up the state of the source code mid-build. The easiest 
> way for this is to just configure one Executor per Jenkins node.
> Another solution: *If you're on a linux system*, there might be something 
> to gain by cloning from a disk-local repository, so Git will make use of 
> hard-links to save disk space. See:
>   Job -> Configure -> Advanced (second one) -> Path of the reference repo 
> to use during clone (optional)
> In the long term, you want to split up the big repository into the bits 
> that can be built and tested in isolation. 
> (Many people see Subversion's arbitrary tree checkout as a strength in 
> this sense, but I like to think that it's a matter of good architecture to 
> split the things that don't really belong together into different 
> repositories.)

You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To view this discussion on the web visit
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at

Reply via email to