This is a demonstration of a mildly-interesting security concern
relating to Git & git-filter-branch - not a vulnerability in Git
itself, just in the way it can be used. I thought it was interesting
to demonstrate that there is sometimes an avenue of attack for
recovering sensitive data that's been removed from Git history using
git-filter-branch. I think it's a low-severity issue, you may wish to
ignore this, and indeed I've been very politely told already that it's
clearly nonsense :)

Here's an unmodified repo, in which the user unwisely committed a
database password:

The unwise commit is reverted with a second commit using 'git revert',
which obviously leaves the password in Git history, and - some time
later - it's decided to properly clean the repo history with
git-filter-branch & git gc, purging the password so the repo can be
more widely shared (open-sourced, or just externally hosted).

git-filter-branch works exactly as intended, purging the password, but
the one thing it does not- typically - do is update the commit
message. So in the cleaned repo, the commit message for the revert
commit still looks like this:

It contains a commit id (8c9cfe3) which is no longer in the repo, but
can very easily be associated with an existing commit simply by
examining the subject line of the reverted commit ("Carelessly
checking password into source control"). It's also obvious, from
examining the repo, where the excised data was removed (ie at the
"db.password=" line). At this point it's possible to do a brute-force
attack where you generate possible passwords, insert them into the
available commit's tree, and compare them against the leaked commit
id. When the the commit id matches, the sensitive data has been

A proof-of-concept implementation of this attack was indeed able to
recover the purged password:

$ java -jar gma-0.1.jar 8c9cfe3c attack-pinpoint

Brute-force search using these characters : 0123456789abcdefghijklmnopqrstuvwxyz
Available commit, presumed cleaned : 8ebbf661
File path : src/main/resources/
Template blob : dca1a2fb
Exhausted strings of length 1 or less
Exhausted strings of length 4 or less
Match with '0g6rw'

So all of this amounts to a fairly low severity issue - people should
always change credentials when they mistakenly commit them to a repo -
but I guess the point is that from a paranoia point of view, you want
to remove all information - including old commit hashes buried in
commit messages - that relate to sensitive data when you clean a repo
for sharing. The git-filter-branch command has a --msg-filter option
which could be used for this purpose, with the application of some
judicious bash-scripting, grep&sed-ing. However, I must confess that I
believe users would be better advised to use The BFG:

The BFG already addresses this issue by replacing all old Git
object-ids found in commit/tag messages with the updated id. For
instance, here's that exact same commit message when cleaned with the

In the case that the users specifies a filtering operation is not
removing 'private' data, the BFG replaces old ids with text of the
form '"newid [formerly oldid]", but if the operation is in fact to
strip private data, the replacement value is simply the newid - and
without the old commit id, the attack described above is not possible.

I believe it's worth educating users to give them a more realistic
understanding of their exposure, and would like to update the
documentation of git-filter-branch to give them a better idea of their
options for removing private data - that would include noting the BFG
as alternative.

- Roberto Tyley
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
More majordomo info at

Reply via email to