> * Consider having that queue of yours just send the pushed payload
> instead of "pull this", see git-bundle. This can turn this sync entire
> thing into a static file distribution problem.
As far as I know, GHE doesn't support this out of the box. We've asked
them for essentially this, though. Due to the nature of our license we
may not be able to configure something like this on the server
> * It's not clear from your post why you have to worry about all these
> branches, surely your Chef instances just need the "master" branch,
> just push that around.
We allow deployments from non-master branches, so we do need multiple
branches. We also use the replication fleet as the target for our
build system, which needs to be able to build essentially any branch
on any repository.
> * If you do need branches consider archiving stale tags/branches
> after some time. I implemented this where I work, we just have a
> $REPO-archive.git with every tag/branch ever created for a given
> $REPO.git, and delete refs after a certain time.
This is something else that we're actively considering. Why did your
company implement this -- was it to reduce load, or just to clean up
your repositories? Did you notice any change in server load?
> * If your problem is that you're CPU bound on the master have you
> considered maybe solving this with something like NFS, i.e. replace
> your ad-hoc replication with just a bunch of "slave" boxes that mount
> the remote filesystem.
This is definitely an interesting idea. It'd be a significant
architectural change, though, and not one I'm sure we'd be able to get
> * Or, if you're willing to deal with occasional transitory repo
> corruption (just retry?): rsync.
I think this is a cost we're not necessarily okay with having to deal with.
> * Theres's no reason for why your replication chain needs to be
> single-level if master CPU is really the issue. You could have master
> -> N slaves -> N*X slaves, or some combination thereof.
This was discussed above - if the primary driver of load is the first
fetch, then moving to a multi-tiered architecture will not solve our
> * Does it really even matter that your "slave" machines are all
> up-to-date? We have something similar at work but it's just a minutely
> cronjob that does "git fetch" on some repos, since the downstream
> thing (e.g. the chef run) doesn't run more than once every 30m or
> whatever anyway.
It does, because we use the replication fleet for our build server.