On Sat, 16 Apr 2005, Petr Baudis wrote:

> Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter
> where Simon Fowler <[EMAIL PROTECTED]> told me that...
>
> > The id is a sha1 hash of the current time and the full path of the
> > file being added - the chances of that being replicated without
> > malicious intent is extremely small. There are other things that
> > could be used, like the hostname, username of the person running the
> > program, etc, but I don't really see them being necessary.
> 
> Why not just use UUID?

Note that using anything that isn't data-related totally destroys the 
whole point of the object database. Remember: any time we don't uniquely 
generate the same name for the same object, we'll waste disk-space.

So adding in user/machine/uuid's to the thing is always a mistake. The 
whole thing depends on the hash being as close to 1:1 with the contents as 
humanly possible. 

There's also the issue of size. Yes, I could have chosen sha256 instead of
sha1. But the keys would be almost twice as big, which in turn means that 
the "tree" objects would be bigger, and that the "index" file would be 
bigger.

Is that a huge problem? No. We can certainly move to it if sha1 ever shows
itself to be weak. But I really think we are much better off just
re-generating the whole tree and history at that point, rather than try to 
predict the future.

The fact is, with current knowledge, sha1 _is_ safe for what git uses it 
for, for the forseeable future. And we have a migration strategy if I'm 
wrong. Don't worry about it.

Almost all attacks on sha1 will depend on _replacing_ a file with a bogus
new one. So guys, instead of using sha256 or going overboard, just make 
sure that when you synchronize, you NEVER import a file you already have.

It's really that simple. Add "--ignore-existing" to your rsync scripts,
and you're pretty much done. That guarantees that a new evil blob by the
next mad scientist out to take over the world will never touch your
repository, and if we make this part of the _standard_ scripts, then
dammit, security is in good _practices_ rather than just relying blindly
on the hash being secure.

In other words, I think we could have used md5's as the hash, if we just
make sure we have good practices. And it wouldn't have been "insecure".

The fact is, you don't merge with people you don't trust. If you don't
trust them, they have a much easier time corrupting your repository by
just creating bugs in the code and checking that thing in. Who cares about
hash collisions, when you can generate a kernel root vulnerability by just
adding a single line of code and use the _correct_ hash for it.

So the sha1 hash does not replace _trust_. That comes from something else 
altogether.

                        Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to