clone 569505 -1 tags -1 = retitle -1 read-tree (?): silently skips some corrupted objects thanks
Zygo Blaxell wrote: > git checkout gives you a working tree where corrupt files are missing, > and an index where corrupt files are marked deleted. Not good. Will investigate. > git filter-branch aborts when it sees the corrupt data if you have a > tree-filter, but if you only have an index-filter it will ignore > corrupt objects unless you do something to force it to examine their > contents. Seems sensible. Sometimes getting the right history requires repeated invocations of filter-branch, so the ideal thing is to find some way to examine (compare) the whole history before and after, and the next best thing is to explicitly run a fsck before. > filter-branch index-filter won't help you if other objects have been > deltaified based on corrupt objects--at that point, recovery is very hard. > I've only seen that occur on pack files that were corrupted outside of > git, though, so it's not a Git problem. I think conventional wisdom is that in that case the best thing is to explode the pack with git unpack-objects -r and recover what you can. If there is crucial data that that misses, one can use git verify-pack -v as a starting point to examine and repair the corruption. > git gc will notice the corruption if it's packing corrupt loose objects. > It fails to notice if it's not packing loose objects, e.g. because > the loose objects are not old enough. Right, this could be changed. I haven’t decided whether I think it’s worth it (probably it is). >> I assume you used rebase -f? Clever. > > I reset to one commit before the corruption, then manually extract the > surviving changes between the commit after the corruption and the next > commit that modifies the corrupted file. Oh, sounds more painful. I guess I was expecting it to be easier because the object data is all there; it just has the wrong SHA-1. That is not the case in other corruption scenarios, so maybe it is silly to spend too much time thinking about how to deal with it, but I think it’s worth trying anyway (at least maybe to write a script for contrib/). > The problem isn't speed--the problem is tree-filter's requirement to check > out the data. It can't, because the data is corrupt. filter-branch does > check in that case, and it should (otherwise a filesystem on unreliable > media could spray undetected junk into your repo). It just does checkout-index, clean, and update-index; the only obvious difference from a checkout + (munge) + add I can see is the clean. > It's usually hard when the file was in some transient state during the > SHA1 calculation. ;) Ah, I guess this happens with e.g. text editor swapfiles? Ick. >> - racy add, as you noticed; > > Only Git seems to have that. SVN and CVS didn't. Or maybe they did, > but they lacked the internal integrity-checking mechanisms to detect it. I suspect SVN just uses a CRC32 computed at the same time as the files are compressed, which indeed would not have the same problem. http://svnbook.red-bean.com/nightly/en/svn.ref.svnadmin.c.verify.html CVS and RCS I have no clue about. >> - checkout is not atomic or close to atomic; > > Not a problem in my use cases. Checkouts are very rare, usually only > occurring after some disaster or other. True enough --- if you can wait to checkout until nothing cares about what’s happening with those files (e.g. a shutdown), there’s no problem. >> - large files are not supported well (but there is some work going on >> to change this); > > "Large" is relative to the size of the system doing the work. 15 years > ago, 1MB was a "large" file; today, 1MB is on the high end of "small." I had trouble tracking a small repository of audio files I was working on because of this. >> - uncompressible files are not supported well; > > Much better than CVS. > >> - rename detection works poorly with binary files; > > Still better than CVS or SVN. Sure, as far as version control systems go, git is a good back up systems, but what about backup systems? Sadly, I don’t even know enough to say what replicating snapshot-based backup system is the standard of care so to speak. >> - no quick way to throw away old history. > > I don't intend to throw away old history at all. I guess if the history gets unmanageably big, one can start a new repo and graft them together when needed. > Compression, integrity checking, and replication are the big wins for me. > The compression advantage of Git vs. other tools is not trivial. Git > outperforms Subversion by something like 200:1. I think any good backup system should have these things. Your other reasons are more compelling. An unstated reason --- that git, like cvs and svn, is a tool developers already often know quite well how to use --- is also probably important. -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

