Re: [Python-Dev] Hash collision security issue (now public)

Victor Stinner Sun, 01 Jan 2012 09:36:18 -0800

Le 01/01/2012 04:29, Paul McMillan a écrit :

This is incorrect. Once an attacker has guessed the random seed, any
operation which reveals the ordering of hashed objects can be used to
verify the answer. JSON responses would be ideal. In fact, an attacker
can do a brute-force attack of the random seed offline. Once they have
the seed, generating collisions is a fast process.

If we want to protect a website against this attack for example, we mustsuppose that the attacker can inject arbitrary data and can get(indirectly) the result of hash(str) (e.g. with the representation of adict in a traceback, with a JSON output, etc.).

The goal isn't perfection, but we need to do better than a simple
salt.

I disagree. I don't want to break backward compatibility and have ahash() function different for each process, if the change is not aneffective protection against the "hash vulnerability".

It's really hard to write a good (secure) hash function: see for examplethe recent NIST competition (started in 2008, will end this year). Evengood security researcher are unable to write a strong and fast hashfunction. It's easy to add a weakness in the function if you don't havea good background in cryptography. The NIST competition gives 4 years toanalyze new hash functions. We should not rush to add a quick "hack" ifit doesn't solve correctly the problem (protect against a collisionattack and preimage attack).


http://en.wikipedia.org/wiki/NIST_hash_function_competition
http://en.wikipedia.org/wiki/Collision_attack

Runtime performance does matter, I'm not completly sure that changingPython is the best place to add a countermeasure against avulnerability. I don't want to slow down numpy for a web vulnerability.Because there are different use cases, a better compromise is maybe toadd a runtime option to use a secure hash function, and keep the unsafebut fast hash function by default.

I propose we modify the string hash function like this:

https://gist.github.com/0a91e52efa74f61858b5

Always allocate 2**21 bytes just to workaround one specific kind ofattack is not acceptable. I suppose that the maximum acceptable is 4096bytes (or better 256 bytes).

Crytographic hash functions don't need random data, why would Pythonneed 2 MB (!) for its hash function?


Victor
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hash collision security issue (now public)

Reply via email to