It seems that git objects are zlib-compressed. If you pipe them through a zlib decompressor (like /usr/share/doc/libcompress-zlib-perl/examples/filtinf if you have libcompress-zlib-perl installed), you can see the actual data, which is formatted pretty simply--it's the words "commit", "blob", "tree" or "tag", followed by a space and then the size of the object in a zero-terminated string of numbers, e.g., "commit 404\0".
The problem is that there's no compression header on the files; "file -z" doesn't look inside the files, because they don't match any of its compression magic patterns. Unless the way that -z works is changed to try decompression on every file, I don't think it's possible to detect git loose objects, simply because they don't have any header data. http://book.git-scm.com/7_browsing_git_objects.html http://book.git-scm.com/7_the_packfile.html http://repo.or.cz/w/git.git?a=blob;f=Documentation/technical/pack-format.txt;h=1803e64e465fa4f8f0fe520fc0fd95d0c9def5bd;hb=HEAD This is true for the packfile format up until version 1.6 as well; the newer version of the .idx file has a magic number in the header, but the older one does not. The .pack file contains 'PACK', a four-byte version number and a four-byte count of contained objects, but this partly conflicts with id Software's .PAK file format, which begins with "PACK" as well. http://www.gamers.org/dEngine/quake/spec/quake-spec33/qkspec_3.htm On the other hand, the Quake packs use a little-endian offset immediately following the magic, the first byte of which is never zero, while the git packs use a big-endian version number, the first byte of which is always zero. We can then pretend that the magic for git packs is 'PACK\0', and this seems to disambiguate it reliably from id's format. I believe that this is all the magic that can be applied to git objects; the formats aren't particularly amenable to it. The only other thing I can think of is commenting out the VAX COFF sections; I don't know how common those files are, but they seem to pop up a lot of false positives. (Also, the README states that "Match of <= 16 bits are not accepted", and the VAX COFF magics are 16 bits.) The attached patch applies against current Debian git, and implements what's described above. It doesn't fix the problem where zlib-compressed objects are detected as VAX COFF, but it does do the rest. Adam Buchbinder
From 4785506b7f3371e49e3f93187597d0e496a52a3a Mon Sep 17 00:00:00 2001 From: Adam Buchbinder <[email protected]> Date: Tue, 3 Feb 2009 13:27:37 -0500 Subject: [PATCH] Add detection magic for git packs and indexes, making sure it doesn't conflict with id Software .PAK files. --- debian/patches/00list | 1 + debian/patches/342-magic-add-git.dpatch | 44 +++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 0 deletions(-) create mode 100755 debian/patches/342-magic-add-git.dpatch diff --git a/debian/patches/00list b/debian/patches/00list index 82c74bf..6d2dba7 100644 --- a/debian/patches/00list +++ b/debian/patches/00list @@ -37,6 +37,7 @@ 339-magic-add-scribus.dpatch 340-magic-add-selinux.dpatch 341-magic-add-bzr.dpatch +342-magic-add-git.dpatch 901-file-mgc.dpatch 903-file-localmagic.dpatch 904-file-make.dpatch diff --git a/debian/patches/342-magic-add-git.dpatch b/debian/patches/342-magic-add-git.dpatch new file mode 100755 index 0000000..d517002 --- /dev/null +++ b/debian/patches/342-magic-add-git.dpatch @@ -0,0 +1,44 @@ +#! /bin/sh /usr/share/dpatch/dpatch-run +## 342-magic-add-git.dpatch by Adam Buchbinder <[email protected]> +## +## All lines beginning with `## DP:' are a description of the patch. +## DP: Add detection for git packs and indexes, making sure it doesn't +## DP: clash with id Software PACK files. (Closes: #509942) + +...@dpatch@ +diff -urNad file~/magic/Magdir/games file/magic/Magdir/games +--- file~/magic/Magdir/games 2009-01-29 16:01:53.000000000 -0500 ++++ file/magic/Magdir/games 2009-02-03 13:20:29.000000000 -0500 +@@ -33,6 +33,7 @@ + # Quake + + 0 string PACK Quake I or II world or extension ++>8 lelong >0 \b, %d entries + + #0 string -1\x0a Quake I demo + #>30 string x version %.4s +diff -urNad file~/magic/Magdir/revision file/magic/Magdir/revision +--- file~/magic/Magdir/revision 2009-01-29 16:01:53.000000000 -0500 ++++ file/magic/Magdir/revision 2009-02-03 13:20:29.000000000 -0500 +@@ -12,6 +12,21 @@ + # From: Josh Triplett <[email protected]> + 0 string #\ v2\ git\ bundle\n Git bundle + ++# Type: Git pack ++# From: Adam Buchbinder <[email protected]> ++# The actual magic is 'PACK', but that clashes with Doom/Quake packs. However, ++# those have a little-endian offset immediately following the magic 'PACK', ++# the first byte of which is never 0, while the first byte of the Git pack ++# version, since it's a tiny number stored in big-endian format, is always 0. ++0 string PACK\0 Git pack ++>4 belong >0 \b, version %d ++>>8 belong >0 \b, %d objects ++ ++# Type: Git pack index ++# From: Adam Buchbinder <[email protected]> ++0 string \377tOc Git pack index ++>4 belong =2 \b, version 2 ++ + # Type: Mercurial bundles + # From: Seo Sanghyeon <[email protected]> + 0 string HG10 Mercurial bundle, -- 1.5.6.3

