Re: [git-users] Synchronizing air gapped git repositories using bundles

Lowell Alleman Thu, 02 Feb 2017 10:24:17 -0800

Philip,

Thanks for the reply!

The reason I'm looking at using a script is mostly for standardization.  So 
that the file names are consistent and to capture some bundle metadata 
necessary for the file transfer process (file name, size, checksum, ...) 
 We capture some metadata about the bundle such as: Revision count and some 
delta details (specifically, the the output of  "git diff --stat" and "git 
log --stat").  This helps answer the question about what is being 
transferred in a given bundle.  (And to the best of my knowledge, there's 
no way to get this info from the bundle file itself.)   Secondary reasons 
for the script comes down to mixed levels of user fluency with git, a 
general mandate to automate tasks, and, currently, the script is 
responsible for tracking the "last export point" via tags.  (Oh, and I 
found it easy to forget to include refs, like refs/heads/master in the 
bundle, and then importing became super painful on the other side.)

I was trying to stick with specific revisions and avoid overlapping 
exports, for a few reasons:  (1) so that we could build a change "manifest" 
to go along with the bundle that would only include what's "new", (2) so 
that if we need to release multiple fixes in a short period of time, like 
more that one a day, we don't end up just copying the same stuff around 
over and over again (we are looking at a scheduled monthly sync up to keep 
divergence from becoming significant, but we may need to sync up multiple 
times a day on rare occasions), and (3) just to generally minimize file 
transfer size (not a huge deal, talking a few MBs).

I fully agree on the file transfer rejections point.  Hasn't happened yet. 
 Policy work is ongoing.

When you refer to bundle stacking.  Is there a way to specify multiple 
locations to pull from at once, or are you just referring to the fact that 
you can sequentially pull from multiple bundle files.  (I'm assuming the 
second.)

Yes, "recording what has been transferred" is exactly the core issue I'm 
facing.  I've noted above some of the reasons I was trying to use a tighter 
revisions selection (using tags) vs using dates, but I'm certainly 
reconsidering that thought process.  The more I think about it the more I'm 
liking it.  That would dramatically simplify the process and workaround the 
inherent issues/limitations of tags (specifically regarding the moving of 
tags).  The fundamental challenge is that there is not repository (in the 
general sense, not necessarily git) that can be accessed from all the of 
the environments.  Ah the joys of air gapped networks... so much fun!

I guess the biggest down side is just transferring extra stuff around and 
having to write down dates.  (Probably on a wiki, or something like that)

So let's say we setup a "monthly" transfer schedule, I should be able to 
use something like this:  

   git bundle create mysuff-Jan2017.bundle --since=1.month.ago master

And if I accidentally skip a month, (which I determine, only after 
transferring the above file), I should be able to do this:

   git bundle create mystuff-Dec2016.bundle --since=2.month.ago 
--before=1.month.ago master

One thing I'm trying to figure out is if I should include the 
"--branches=master" filter as well?  I don't really want to 
synchronize other branches, which are mostly used for merging or for really 
big changes.  I also don't want to include the "mirror" branches, but 
that's probably not a big deal since most of the revisions are shared 
between the branches anyways.  And if I don't limit the revisions to just 
the "master" branch, I'm assuming that the unwanted branch contains 
revisions prior to the exported time frame, then I suspect that I may end 
up with revisions dependencies in the bundle that I don't want to have.

In other words, say a bundle contains revisions from "master" and 
"rewrite-it-all" branches.  (These branches are independent of each other. 
 Specifically, no merging occurs between these two branches during the 
exported timeframe.)  So because each branch started before the export 
timeframe, there are 2 external revision dependancies.  (Let say:   aaa for 
master, and bbb for rewrite-it-all).   So I have I "aaa" in my local 
repository, but not "bbb".  So this means that git bundle verify would 
fail.  So at this point, can I still import the "master" branch from the 
bundle?  Or does git require that the repository have ALL of the revision 
dependancies for all revisions in the bundle.

On the other hand, I've also had issues getting bundle create to include 
some very specific merge commit revisions in the past.  (I think these were 
explicit no-change merge commits.)  I fought with it for quite some time 
but ultimately ended up just bundling up the entire repository and 
distributing that.  I really want to avoid that in the future (as the total 
repository size is becoming more significant.)

So I guess the more fundamental question is this:  Is it better to use the 
macro approach (and risk pushing around lots of extra stuff) which could 
result in unreferenced revisions in the destination repositories, or is it 
better to use the micro-mode and be strategic about just the specific 
branches/revisions I want to synchronize.

Right now I've been grabbing just the branch I want from the bundle, and 
normally that's all it includes anyways.  (e.g.  git checkout mirror-REPO1; 
git fetch my.repo1.bundle master").  And I'm now wondering if in doing so, 
I could ultimately end up missing necessary revisions that would be 
imported if I used "git fetch".  Is that possible?

Okay, that's enough rambling.   Thanks again for any help you can provide. 
  As you may have gathered, I've been fighting with this process for quite 
some time now.  I probably have an unbalanced knowledge of git that's 
currently working against me.  I understand enough of how I think git works 
to get me into trouble, but not enough to get back out of it. ;-)  And I'm 
working with a rather old version of git 1.7.

Thanks in advance!

On Wednesday, February 1, 2017 at 6:47:33 PM UTC-5, Philip Oakley wrote:
>
> Hi Lowell,
>  
> You can use all of the options in the rev-list for selecting which commits 
> are in the bundle (which is just a thin wrapper around the pack file that 
> would be sent over the wire). 
>  
> You can include more commits in the bundle than you need [1], that is, 
> have an overlap. One option is simply to use the --since=<date> option as a 
> way of ensuring you go far enough back in history. Plus the --all to get 
> *everything* after tha date [2].
>  
> I suspect that part of the problem is finding a way of recording what has 
> been transferred in the three way transfer - I'd suggest it's just as easy 
> to use a small note book (or formal admin log) for recording the date of 
> transfers and use that to guide the bundle creation.
>  
> Plus you can always stack up the bundles, so can fetch first from the 
> oldest bundle, and then from the newer bundle, etc. 
>  
> I see you have the typical 'transfer review' process for the bundle 
> exchange (implies a certain kind of environment ;-) - does it ever 
> fail/reject the transfer? or is it simply making sure it is what you 
> thought it was and have recorded the transfer correctly (I expect it's 
> actually the latter). If you get true rejection you have more issues.
>  
> I don't really think you need a special 'script' (beyond satisfying some 
> edict), as the bundle and fetch commands should be sufficient for doing the 
> transfer.
>  
> Probably the biggest issue at that point is having a standardised naming 
> convention for the bundle file, e.g. server<n>-<datethen>-<datenow>.bndl so 
> that you know where it came from, where the --since cut point was, and when 
> it was created.
>  
> Then it becomes fairly easy to import/fetch from the bundle acording to 
> the carefully mandated process. 
>  
> Philip
>  
> [1] https://git-scm.com/docs/git-bundle
> It is okay to err on the side of caution, causing the bundle file to 
> contain objects already in the destination, as these are ignored when 
> unpacking at the destination.
> [2] 
> http://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo
>
> ----- Original Message ----- 
> *From:* Lowell Alleman <javascript:> 
> *To:* Git for human beings <javascript:> 
> *Sent:* Wednesday, February 01, 2017 9:58 PM
> *Subject:* [git-users] Synchronizing air gapped git repositories using 
> bundles
>
> I have 3 separate air-gapped git repositories (hosted on local GitHub 
> enterprise) that I'm trying to keep in sync.   Currently, I'm using "git 
> bundle" to push revisions back and forth, which worked fairly well with 
> just 2 repositories, but I'm struggling a bit since the 3rd (and final) 
> repository has been added to the mix.  I was using a single tag to track 
> the point of last export as noted in the "git bundle" docs, but I'm 
> struggling to make that scale with 2+ total repositories. 
>
> In terms of information flow, we've deemed one of the repositories as 
> "primary" and the other two as "secondary" repositories.  So in a sense we 
> are using the "primary" repository like a development and merging area so 
> that all changes go through the primary repository and trickle down to the 
> secondary repositories.  Changes are always pushed upstream to primary, and 
> then synced down to the other secondary repository. 
>
> Please note that our use of git is more like a "versioned file system" 
> than the typical developer use case.  I go on to explain that a bit more 
> later, but wanted to get to my main question before everyone gives up on 
> reading this really long and complicated explanation of the mess I made. 
>
> *Q:  Does anyone know of any existing scripts, documented methods, or best 
> practices to follow when syncing a branch between multiple air-gapped 
> repositories?*
>
> *How we are using git:*  As noted above, this is NOT a typical 
> development-centered use-case.  Branching is very infrequent, and most work 
> is done on the "master" branch in each repository.  Unlike typical 
> developer-centric approaches, each clone (working copy) ends up tied to a 
> specific server, rather than a single developer.  So multiple users end up 
> working in the same working copy and committing code from one place.  The 
> team is small and the changes are infrequent enough that this works for us, 
> despite the atypical and less-than-ideal use case.
>
> *How we are using branches:*   We treat each repository as if it has just 
> one branch, a single "master".  However, because of the synchronization 
> requirements, we create special purpose branches in each repository that 
> essentially mirror the master branches of the other repositories.  So the 
> primary repository has 2 mirrored branches, one for each of the secondary 
> repositories.  And each secondary repository has a single mirrored branch 
> that represents the primary (upstream) repository.  (By convention, we have 
> agreed never to synchronize revisions directly between the two secondary 
> repositories.)  Local changes are never applied to a mirrored repository 
> branch, so that it should match the "master" branch of the mirrored 
> repository exactly.  (That is, the only changes to these mirrored branches 
> are fast-forward only "pull"s made from bundle files exported from 
> the mirrored repository.)   The process of merging changes between branches 
> is manual, and I think I want to keep it that way for the foreseeable 
> future.  (Perhaps one day I'll make fast-forward merges apply 
> automatically, but in general I want a human to be responsible for this 
> step.)  So while each repositories' "master" branch may diverge, or at 
> least have a slightly different history, in the end, they should all end up 
> with the same content.  Well, at least that's the ultimate goal. 
>
> *File transfer:  *Transferring bundle files between air-gapped 
> environments involve multiple human steps including content review, 
> approval, and some safety checks for compliance.  Therefore, there's no way 
> to automatically schedule synchronization, which is a bummer.   That being 
> said, I'd like to make this as painless as possible within the realm of 
> what I can control.  I'm looking to create import and export scripts (or 
> find existing ones to borrow from) that handle bundle creation and the 
> import process. 
>
> I'm looking for a little help designing an appropriate synchronization 
> solution, and would appreciate any feedback you may have.  The combination 
> of using git bundle and our non-traditional use case has made it difficult 
> to find relevant resources. If there is anything I've missed, please point 
> me in the right direction.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Git for human beings" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to git-users+...@googlegroups.com <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [git-users] Synchronizing air gapped git repositories using bundles

Reply via email to