On Wednesday, October 16, 2013 02:34:01 am Jeff King wrote:
> I was recently presented with a repository with a
> corrupted packfile, and was asked if the data was
> recoverable. This post-mortem describes the steps I took
> to investigate and fix the problem. I thought others
> might find the process interesting, and it might help
> somebody in the same situation.

This is awesome Peff, thanks for the great writeup!

I have nightmares about this sort of thing every now and 
then, and we even experience some corruption here and there 
that needs to be fixed (mainly missing objects when we toy 
with different git repack arguments).  I cannot help but 
wonder, how we can improve git further to either help 
diagnose or even fix some of these problems?  More inline 
below...


> The first thing I did was pull the broken data out of the
> packfile. I needed to know how big the object was, which
> I found out with:
> 
>   $ git show-index <$idx | cut -d' ' -f1 | sort -n | grep
> -A1 51653873 51653873
>   51664736
> 
> Show-index gives us the list of objects and their
> offsets. We throw away everything but the offsets, and
> then sort them so that our interesting offset (which we
> got from the fsck output above) is followed immediately
> by the offset of the next object. Now we know that the
> object data is 10863 bytes long, and we can grab it
> with:
> 
>   dd if=$pack of=object bs=1 skip=51653873 count=10863

Is there a current plumbing command that should be enhanced 
to be able to do the 2 steps above directly for people 
debugging (maybe with some new switch)?  If not, should we 
create one, git show --zlib, or git cat-file --zlib?


> Note that the "object" file isn't fit for feeding
> straight to zlib; it has the git packed object header,
> which is variable-length. We want to strip that off so
> we can start playing with the zlib data directly. You
> can either work your way through it manually (the format
> is described in
> Documentation/technical/pack-format.txt), or you can
> walk through it in a debugger. I did the latter,
> creating a valid pack like:
> 
>   # pack magic and version
>   printf 'PACK\0\0\0\2' >tmp.pack
>   # pack has one object
>   printf '\0\0\0\1' >>tmp.pack
>   # now add our object data
>   cat object >>tmp.pack
>   # and then append the pack trailer
>   /path/to/git.git/test-sha1 -b <tmp.pack >trailer
>   cat trailer >>tmp.pack
> 
> and then running "git index-pack tmp.pack" in the
> debugger (stop at unpack_raw_entry). Doing this, I found
> that there were 3 bytes of header (and the header itself
> had a sane type and size). So I stripped those off with:
> 
>   dd if=object of=zlib bs=1 skip=3

This too feels like something we should be able to do with a 
plumbing command eventually?

git zlib-extract

> So I took a different approach. Working under the guess
> that the corruption was limited to a single byte, I
> wrote a program to munge each byte individually, and try
> inflating the result. Since the object was only 10K
> compressed, that worked out to about 2.5M attempts,
> which took a few minutes.

Awesome!  Would this make a good new plumbing command, git 
zlib-fix?


> I fixed the packfile itself with:
> 
>   chmod +w $pack
>   printf '\xc7' | dd of=$pack bs=1 seek=51659518
> conv=notrunc chmod -w $pack
> 
> The '\xc7' comes from the replacement byte our "munge"
> program found. The offset 51659518 is derived by taking
> the original object offset (51653873), adding the
> replacement offset found by "munge" (5642), and then
> adding back in the 3 bytes of git header we stripped.

Another plumbing command needed?  git pack-put --zlib?

I am not saying my command suggestions are good, but maybe 
they will inspire the right answer?

-Martin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to