Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-18 Thread Victor Stinner
2012/1/18 Martin v. Löwis mar...@v.loewis.de: For 3.3 onwards, I'm skeptical whether all this configuration support is really necessary. I think a much smaller patch which leaves no choice would be more appropriate. The configuration helps unit testing: see changes on Lib/test/*.py in my last

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-18 Thread Hrvoje Niksic
On 01/17/2012 09:29 PM, Martin v. Löwis wrote: I(0) = H MASK PERTURB(0) = H I(n+1) = (5*I(n) + 1 + PERTURB(n)) MASK PERTURN(n+1) = PERTURB(n) 5 So if two objects O1 and O2 have the same hash value H, the sequence of probed indices is the same for any MASK value. It will be a

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread martin
It doesn't change anything, you will still get collisions. That depends right? If the collision is because they all have the same hash(), yes. It might be different if it is because the secondary hashing (or whatever it's called :-) causes collisions. But Python deals with the latter case

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Stephen J. Turnbull
mar...@v.loewis.de writes: It doesn't change anything, you will still get collisions. That depends right? If the collision is because they all have the same hash(), yes. It might be different if it is because the secondary hashing (or whatever it's called :-) causes collisions.

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Victor Stinner
I thought that the original problem was that with N insertions in the dictionary, by repeatedly inserting different keys generating the same hash value an attacker could arrange that the cost of finding an open slot is O(N), and thus the cost of N insertions is O(N^2). If so, frequent

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Victor Stinner
I finished my patch transforming hash(str) to a randomized hash function, see random-8.patch attached to the issue: http://bugs.python.org/issue13703 The remaining question is which random number generator should be used on Windows to initialize the hash secret (CryptoGen adds an overhead of 10%,

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Jeremy Sanders
Victor Stinner wrote: If hash(str1)DICT_MASK == hash(str2)DICT_MASK but hash(str1)!=hash(str2), strings are not compared (this is a common optimization in Python), and the so the attack would not be successful (it would be slow, but not as slow as comparing two strings). It's a shame the

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Jeremy Sanders
Jeremy Sanders wrote: Victor Stinner wrote: If hash(str1)DICT_MASK == hash(str2)DICT_MASK but hash(str1)!=hash(str2), strings are not compared (this is a common optimization in Python), and the so the attack would not be successful (it would be slow, but not as slow as comparing two

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Martin v. Löwis
I thought that the original problem was that with N insertions in the dictionary, by repeatedly inserting different keys generating the same hash value an attacker could arrange that the cost of finding an open slot is O(N), and thus the cost of N insertions is O(N^2). If so, frequent

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Martin v. Löwis
I plan to commit my fix to Python 3.3 if it is accepted. Then write a simplified version to Python 3.2 and backport it to 3.1. I'm opposed to any change to the hash values of strings in maintenance releases, so I guess I'm opposed to your patch in principle. See my next message for an

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Victor Stinner
I plan to commit my fix to Python 3.3 if it is accepted. Then write a simplified version to Python 3.2 and backport it to 3.1. I'm opposed to any change to the hash values of strings in maintenance releases, so I guess I'm opposed to your patch in principle. If randomized hash cannot be

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread martin
If randomized hash cannot be turned on by default, an alternative is to switch them off by default, and add an option (command line option, environment variable, etc.) to enable it. That won't really fix the problem. If people install a new release because it fixes a vulnerability, it better

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Gregory P. Smith
On Tue, Jan 17, 2012 at 12:52 PM, Martin v. Löwis mar...@v.loewis.dewrote: I plan to commit my fix to Python 3.3 if it is accepted. Then write a simplified version to Python 3.2 and backport it to 3.1. I'm opposed to any change to the hash values of strings in maintenance releases, so I

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-17 Thread Martin v. Löwis
Am 18.01.2012 07:06, schrieb Gregory P. Smith: On Tue, Jan 17, 2012 at 12:52 PM, Martin v. Löwis mar...@v.loewis.de mailto:mar...@v.loewis.de wrote: I plan to commit my fix to Python 3.3 if it is accepted. Then write a simplified version to Python 3.2 and backport it to 3.1.

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-16 Thread Gregory P. Smith
On Sun, Jan 15, 2012 at 9:44 AM, Guido van Rossum gu...@python.org wrote: On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel stefan...@behnel.dewrote: Guido van Rossum, 15.01.2012 17:10: On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: Terry Reedy, 14.01.2012 06:43: On 1/13/2012 8:58

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-16 Thread Paul McMillan
As I understand it, the way the attack works is that a *single* malicious request from the attacker can DoS the server by eating CPU resources while evaluating a massive collision chain induced in a dict by attacker supplied data. Explicitly truncating the collision chain boots them out

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-16 Thread Tim Delaney
On 17 January 2012 09:23, Paul McMillan p...@mcmillan.ws wrote: This is why the simply throw an error solution isn't a complete fix. Making portions of an interface unusable for regular users is clearly a bad thing, and is clearly applicable to other types of poisoned data as well. We need to

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-16 Thread Tim Delaney
On 17 January 2012 10:14, Tim Delaney timothy.c.dela...@gmail.com wrote: On 17 January 2012 09:23, Paul McMillan p...@mcmillan.ws wrote: This is why the simply throw an error solution isn't a complete fix. Making portions of an interface unusable for regular users is clearly a bad thing, and

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-16 Thread Victor Stinner
2012/1/17 Tim Delaney timothy.c.dela...@gmail.com: What if in a pathological collision (e.g. 1000 collisions), we increased the size of a dict by a small but random amount? It doesn't change anything, you will still get collisions. Victor ___

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-16 Thread Guido van Rossum
On Mon, Jan 16, 2012 at 4:16 PM, Victor Stinner victor.stin...@haypocalc.com wrote: 2012/1/17 Tim Delaney timothy.c.dela...@gmail.com: What if in a pathological collision (e.g. 1000 collisions), we increased the size of a dict by a small but random amount? It doesn't change anything,

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Stefan Behnel
Terry Reedy, 14.01.2012 06:43: On 1/13/2012 8:58 PM, Gregory P. Smith wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. Given that the doc says Return the hash value of the object, I do not

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Guido van Rossum
On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel stefan...@behnel.de wrote: Terry Reedy, 14.01.2012 06:43: On 1/13/2012 8:58 PM, Gregory P. Smith wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Stefan Behnel
Guido van Rossum, 15.01.2012 17:10: On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: Terry Reedy, 14.01.2012 06:43: On 1/13/2012 8:58 PM, Gregory P. Smith wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Gregory P. Smith
On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel stefan...@behnel.de wrote: It also seems to me that the wording has a hash value which never changes during its lifetime makes it pretty clear that the lifetime of the hash value is not guaranteed to supersede the lifetime of the object (although

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Antoine Pitrou
On Sun, 15 Jan 2012 17:46:36 +0100 Stefan Behnel stefan...@behnel.de wrote: Guido van Rossum, 15.01.2012 17:10: On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: Terry Reedy, 14.01.2012 06:43: On 1/13/2012 8:58 PM, Gregory P. Smith wrote: It is perfectly okay to break existing users

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-15 Thread Guido van Rossum
On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel stefan...@behnel.de wrote: Guido van Rossum, 15.01.2012 17:10: On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: Terry Reedy, 14.01.2012 06:43: On 1/13/2012 8:58 PM, Gregory P. Smith wrote: It is perfectly okay to break existing users

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Stephen J. Turnbull
Jack Diederich writes: On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Antoine Pitrou
On Sat, 14 Jan 2012 04:45:57 +0100 mar...@v.loewis.de wrote: What an implementation looks like: http://pastebin.com/9ydETTag some stuff to be filled in, but this is all that is really required. I think this statement (and the patch) is wrong. You also need to change the byte string

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Antoine Pitrou
On Sat, 14 Jan 2012 13:55:22 +1100 Steven D'Aprano st...@pearwood.info wrote: On 14/01/12 12:58, Gregory P. Smith wrote: I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be back ported to any Python version. It is perfectly okay to break existing users who had

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread martin
I think this statement (and the patch) is wrong. You also need to change the byte string hashing, at least for 2.x. This I consider the biggest flaw in that approach - other people may have written string-like objects which continue to compare equal to a string but now hash different. They're

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Martin v. Löwis
Am 13.01.2012 18:08, schrieb Mark Dickinson: On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum gu...@python.org wrote: How pathological the data needs to be before the collision counter triggers? I'd expect *very* pathological. How pathological do you consider the set {1 n for n in

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Martin v. Löwis
Am 14.01.2012 01:37, schrieb Benjamin Peterson: 2012/1/13 Guido van Rossum gu...@python.org: Really? Even though you came up with specifically to prove me wrong? Coming up with a counterexample now invalidates it? There are two concerns here: - is it possible to come up with an example of

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Gregory P. Smith
My patch example does change the bytes object hash as well as Unicode. On Jan 13, 2012 7:46 PM, mar...@v.loewis.de wrote: What an implementation looks like: http://pastebin.com/9ydETTag some stuff to be filled in, but this is all that is really required. I think this statement (and the

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Gregory P. Smith
FWIW the quick change i pastebin'ed is basically covered by the change already under review in http://bugs.python.org/review/13704/show. I've made my comments and suggestions there. I looked into Modules/expat/xmlparse.c and it has an odd copy of the old string hash algorithm entirely for its

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Steven D'Aprano
Victor Stinner wrote: - Marc Andre Lemburg proposes to fix the vulnerability directly in dict (for any key type). The patch raises an exception if a lookup causes more than 1000 collisions. Am I missing something? How does this fix the vulnerability? It seems to me that the only thing this

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Steven D'Aprano
Guido van Rossum wrote: On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith g...@krypto.org wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-14 Thread Nick Coghlan
On Sun, Jan 15, 2012 at 2:42 PM, Steven D'Aprano st...@pearwood.info wrote: Victor Stinner wrote: - Marc Andre Lemburg proposes to fix the vulnerability directly in dict (for any key type). The patch raises an exception if a lookup causes more than 1000 collisions. Am I missing something?

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Frank Sievertsen
Am 13.01.2012 02:24, schrieb Victor Stinner: My patch doesn't fix the DoS, it just make the attack more complex. The attacker cannot pregenerate data for an attack: (s)he has first to compute the hash secret, and then compute hash collisions using the secret. The hash secret is a least 64 bits

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Victor Stinner
Unfortunately it requires only a few seconds to compute enough 32bit collisions on one core with no precomputed data. Are you running the hash function backward to generate strings with the same value, or you are more trying something like brute forcing? And how do you get the hash secret? You

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Lennart Regebro
On Fri, Jan 13, 2012 at 02:24, Victor Stinner victor.stin...@haypocalc.com wrote: - Glenn Linderman proposes to fix the vulnerability by adding a new safe dict type (only accepting string keys). His proof-of-concept (SafeDict.py) uses a secret of 64 random bits and uses it to compute the hash

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Frank Sievertsen
Unfortunately it requires only a few seconds to compute enough 32bit collisions on one core with no precomputed data. Are you running the hash function backward to generate strings with the same value, or you are more trying something like brute forcing? If you try it brute force to hit a

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread And Clover
On 2012-01-13 11:20, Lennart Regebro wrote: The vulnerability is basically only in the dictionary you keep the form data you get from a request. I'd have to disagree with this statement. The vulnerability is anywhere that creates a dictionary (or set) from attacker-provided keys. That would

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Mark Dickinson
On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum gu...@python.org wrote: How pathological the data needs to be before the collision counter triggers? I'd expect *very* pathological. How pathological do you consider the set {1 n for n in range(2000)} to be? What about the set:

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Guido van Rossum
On Fri, Jan 13, 2012 at 9:08 AM, Mark Dickinson dicki...@gmail.com wrote: On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum gu...@python.org wrote: How pathological the data needs to be before the collision counter triggers? I'd expect *very* pathological. How pathological do you

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Mark Dickinson
On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum gu...@python.org wrote: How pathological do you consider the set   {1 n for n in range(2000)} to be?  What about the set:   ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)} ?  The 2000 elements of the latter set have only 61

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Guido van Rossum
On Fri, Jan 13, 2012 at 10:13 AM, Mark Dickinson dicki...@gmail.com wrote: On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum gu...@python.org wrote: How pathological do you consider the set {1 n for n in range(2000)} to be? What about the set: ieee754_powers_of_two = {2.0**n

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Benjamin Peterson
2012/1/13 Guido van Rossum gu...@python.org: Really? Even though you came up with specifically to prove me wrong? Coming up with a counterexample now invalidates it? -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Antoine Pitrou
On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the chances that somebody's

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Victor Stinner
- Glenn Linderman proposes to fix the vulnerability by adding a new safe dict type (only accepting string keys). His proof-of-concept (SafeDict.py) uses a secret of 64 random bits and uses it to compute the hash of a key. We could mix Marc's collision counter with SafeDict idea (being able to

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Guido van Rossum
On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou solip...@pitrou.net wrote: On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Gregory P. Smith
On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum gu...@python.org wrote: On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou solip...@pitrou.netwrote: On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Glenn Linderman
On 1/13/2012 5:35 PM, Victor Stinner wrote: - Glenn Linderman proposes to fix the vulnerability by adding a new safe dict type (only accepting string keys). His proof-of-concept (SafeDict.py) uses a secret of 64 random bits and uses it to compute the hash of a key. We could mix Marc's collision

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Gregory P. Smith
Clearly these ideas are more complex than adding randomization, but adding randomization doesn't seem to be produce immunity from attack, when data about the randomness is leaked. Which will not normally happen. I'm firmly in the camp that believes the random seed can be probed and

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Gregory P. Smith
btw, Tim's commit message on this one is amusingly relevant. :) http://hg.python.org/cpython/diff/8d2f2cb9/Objects/dictobject.c On Fri, Jan 13, 2012 at 6:25 PM, Gregory P. Smith g...@krypto.org wrote: Clearly these ideas are more complex than adding randomization, but adding

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Steven D'Aprano
On 14/01/12 12:58, Gregory P. Smith wrote: I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be back ported to any Python version. It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken.

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Gregory P. Smith
On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith g...@krypto.org wrote: On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum gu...@python.orgwrote: On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou solip...@pitrou.netwrote: On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum gu...@python.org

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread martin
What an implementation looks like: http://pastebin.com/9ydETTag some stuff to be filled in, but this is all that is really required. I think this statement (and the patch) is wrong. You also need to change the byte string hashing, at least for 2.x. This I consider the biggest flaw in that

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Guido van Rossum
On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith g...@krypto.org wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Terry Reedy
On 1/13/2012 8:58 PM, Gregory P. Smith wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. Given that the doc says Return the hash value of the object, I do not think we should be so hard-nosed.

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Jack Diederich
On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the chances that somebody's code

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Nick Coghlan
On Sat, Jan 14, 2012 at 4:24 PM, Jack Diederich jackd...@gmail.com wrote: This is depending on how the counting is done (I didn't look at MAL's patch), and assuming that increasing the hash table size will generally reduce collisions if items collide but their hashes are different. The patch

[Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-12 Thread Victor Stinner
Many people proposed their own idea to fix the vulnerability, but only 3 wrote a patch: - Glenn Linderman proposes to fix the vulnerability by adding a new safe dict type (only accepting string keys). His proof-of-concept (SafeDict.py) uses a secret of 64 random bits and uses it to compute the

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-12 Thread Guido van Rossum
Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the chances that somebody's code will break due to the variable hashing. In fact we know for a fact that the