Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread martin
(Well, technically, you could use trees or some other O log n data structure as a fallback once you have too many collisions, for some value of too many. Seems a bit wasteful for the purpose, though.) I don't think that would be wasteful. You wouldn't just use the tree for the case of too many

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Antoine Pitrou
On Sat, 31 Dec 2011 16:56:00 -0700 Guido van Rossum gu...@python.org wrote: ISTM the only reasonable thing is to have a random seed picked very early in the process, to be used to change the hash() function of str/bytes/unicode (in a way that they are still compatible with each other). Do str

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Guido van Rossum
On Sat, Dec 31, 2011 at 9:11 PM, Antoine Pitrou solip...@pitrou.net wrote: On Sat, 31 Dec 2011 16:56:00 -0700 Guido van Rossum gu...@python.org wrote: ISTM the only reasonable thing is to have a random seed picked very early in the process, to be used to change the hash() function of

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Guido van Rossum
On Sat, Dec 31, 2011 at 8:29 PM, Paul McMillan p...@mcmillan.ws wrote: I'm not too concerned about a 3rd party being able to guess the random seed -- this would require much more effort on their part, since they would have to generate a new set of colliding keys each time they think they

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Paul McMillan
Still, it would represent an effort for the attacker of a much greater magnitude than the current attack. It's all a trade-off -- at some point it'll just be easier for the attacker to use some other vulnerability. Also the class of vulnerable servers would be greatly reduced. I agree that

[Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Jim Jewett
In http://mail.python.org/pipermail/python-dev/2011-December/115138.html, Christian Heimes pointed out that ... we don't have to alter the outcome of hash ... We just need to reduce the chance that an attacker can produce collisions in the dict (and set?) I'll state it more strongly. hash

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Victor Stinner
Le 29/12/2011 02:28, Michael Foord a écrit : A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Steven D'Aprano
Jim Jewett wrote: My personal opinion is that accepting *and parsing* enough data for this to be a problem is enough of an edge case that I don't want normal dicts slowed down at all for this; I would therefore prefer that the change be restricted to such a compile-time switch, with current

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Victor Stinner
Le 29/12/2011 14:19, Christian Heimes a écrit : Perhaps the dict code is a better place for randomization. The problem is the creation of a dict with keys all having the same hash value. The current implementation of dict uses a linked-list. Adding a new item requires to compare the new key

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Victor Stinner
In case the watchdog is not a viable solution as I had assumed it was, I think it's more reasonable to indeed consider adding a flag to Python that allows randomization of hashes optionally before startup. A flag will only be needed if the overhead of the fix is too high. However as it was

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Christian Heimes
Am 31.12.2011 03:31, schrieb Victor Stinner: Le 29/12/2011 14:19, Christian Heimes a écrit : Perhaps the dict code is a better place for randomization. The problem is the creation of a dict with keys all having the same hash value. The current implementation of dict uses a linked-list.

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Christian Heimes
Am 31.12.2011 03:19, schrieb Steven D'Aprano: How about using a similar strategy to the current dict behaviour with __missing__ and defaultdict? Here's my suggestion: - If a dict subclass defines __salt__, then it is called to salt the hash value before lookups. If __salt__ is

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-30 Thread Terry Reedy
On 12/30/2011 8:04 PM, Jim Jewett wrote: I'll state it more strongly. hash probably should not change (at least for this), I agree, especially since the vulnerability can be avoided by using 64 bit servers and will generally abate as more switch anyway. but we may want to consider a

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Fred Drake
On Thu, Dec 29, 2011 at 8:19 AM, Christian Heimes li...@cheimes.de wrote: Persistence layers like ZODB and cross interpreter communication channels used by multiprocessing may (!) rely on the fact that the hash of a string is fixed. ZODB does not rely on a fixed hash function for strings; for

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Armin Ronacher
Hi, Just some extra thoughts about the whole topic in the light of web applications (since this was hinted in the talk) running on Python: Yes, you can limit the number of maximum allowed parameters for post data but really there are so many places where data is parsed into hashing

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Armin Ronacher
Hi, Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to salt hashes. The most common hash to salt here is the PyUnicode hash for obvious reasons. - Option a: Compiled in Salt + Easy to

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Ned Batchelder
On 12/28/2011 9:09 PM, Raymond Hettinger wrote: Also, randomizing the hash wreaks havoc on doctests, book examples not matching actual dict reprs, and on efforts by users to optimize the insertion order into dicts with frequent lookups. I don't have a strong opinion about what to do about this

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Mark Shannon
Michael Foord wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread geremy condra
On Wed, Dec 28, 2011 at 8:49 PM, Eric Snow ericsnowcurren...@gmail.com wrote: On Wed, Dec 28, 2011 at 6:28 PM, Michael Foord fuzzy...@voidspace.org.uk wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 03:55:22 +0100 Christian Heimes li...@cheimes.de wrote: I've been dealing with web stuff and security for almost a decade. I've seen far worse attack vectors. This one can easily be solved with a couple of lines of Python code. For example Application developers can limit

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 04:04:17 +0100 Christian Heimes li...@cheimes.de wrote: Am 29.12.2011 02:37, schrieb Jesse Noller: Back up link for the PDF: http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf Ocert disclosure:

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Mark Shannon
Raymond Hettinger wrote: FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue. It is believed that they give us better-than-random results for commonly encountered datasets. A change to randomized hashes would have a negative performance impact on those cases. Tim Peter's

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 14:32:21 +0100 Christian Heimes li...@cheimes.de wrote: Am 29.12.2011 13:57, schrieb Armin Ronacher: Hi, Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Hynek Schlawack
Hi, how about Option d: Host based salt + Easy-ish to implement – how about basing it on the hostname for example? + transparent for all processes on the same host - probably unit test breakage In fact, we could use host based as default with the option to specify own which would solve

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 11:25:03 + Mark Shannon m...@hotpy.org wrote: Also, randomizing the hash wreaks havoc on doctests, book examples not matching actual dict reprs, and on efforts by users to optimize the insertion order into dicts with frequent lookups. The docs clearly state that

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 12:13, schrieb Mark Shannon: The attack relies on being able to predict the hash value for a given string. Randomising the string hash function is quite straightforward. There is no need to change the dictionary code. A possible (*untested*) patch is attached. I'll leave it for

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 12:10, schrieb Antoine Pitrou: I've been dealing with web stuff and security for almost a decade. I've seen far worse attack vectors. This one can easily be solved with a couple of lines of Python code. For example Application developers can limit the maximum amount of POST

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread M.-A. Lemburg
Mark Shannon wrote: Michael Foord wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Antoine Pitrou
On Thu, 29 Dec 2011 13:57:07 +0100 Armin Ronacher armin.ronac...@active-4.com wrote: - Option c: Random salt at runtime + Easy to implement - impossible to synchronize - breaks unittests in the same way as a compiled in salt would do This option would have my preference. I don't think

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 11:32, schrieb Antoine Pitrou: If I remember correctly CPython uses the long for its hash function so 64bit Windows uses a 32bit hash. Not anymore, Py_hash_t is currently aligned with Py_ssize_t. Thanks for the update. Python 2.x still uses long and several large frameworks

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 13:57, schrieb Armin Ronacher: Hi, Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to salt hashes. The most common hash to salt here is the PyUnicode hash for obvious

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Tim Delaney
+1 to option d (host-based salt) but would need to consistently order the hostnames/addresses to guarantee that all processes on the same machine got the same salt by default. +1 to option c (environment variable) as an override. And/or maybe an override on the command line. +1 to implementing

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Tim Delaney
+1 to option c (environment variable) as an override. And/or maybe an override on the command line. That obviously should have said option b (environment variable) ... Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread PJ Eby
On Thu, Dec 29, 2011 at 8:32 AM, Christian Heimes li...@cheimes.de wrote: IMHO we don't have to alter the outcome of hash(some string), hash(1) and all other related types. We just need to reduce the change the an attacker can produce collisions in the dict (and set?) code that looks up the

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Paul McMillan
It's worth pointing out that if the salt is somehow exposed to an attacker, or is guessable, much of the benefit goes away. It's likely that a timing attack could be used to discover the salt if it is fixed per machine or process over a long period of time. If a salt is generally fixed per

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 21:07, schrieb PJ Eby: I don't understand how that helps a collision attack. If you can still generate two strings with the same (pre-randomized) hash, what difference does it make that the dict adds a random number? The post-randomized number will still be the same, no? Or

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Terry Reedy
The talk was a presentation yesterday by Alexander Klink and Julian Wälde at the Chaos Communication Congress in Germany hash...@alech.de I read the non-technical summary at http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Terry Reedy
On 12/29/2011 4:31 PM, Christian Heimes wrote: The hash randomization idea adds a salt to throw the attacker of course. Instead of position = hash mask it's now hash = salt + hash As I understood the talk (actually, the bit of Perl interpreter C code shown), the randomization is to

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Christian Heimes
Am 29.12.2011 23:28, schrieb Terry Reedy: As I understood the talk (actually, the bit of Perl interpreter C code shown), the randomization is to change hash(s) to hash(salt+s) so that the salt is completely mixed into the hash from the beginning, rather than just tacked on at the end. Yes,

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-29 Thread Tim Delaney
On 30 December 2011 06:59, Tim Delaney timothy.c.dela...@gmail.com wrote: +0 to exposing the salt as a constant (3.3+ only) - or alternatively expose a hash function that just takes an existing hash and returns the salted hash. That would make it very easy for anything that wanted a salted

[Python-Dev] Hash collision security issue (now public)

2011-12-28 Thread Michael Foord
Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-28 Thread Jesse Noller
On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-28 Thread Jesse Noller
On Wednesday, December 28, 2011 at 8:37 PM, Jesse Noller wrote: On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-28 Thread Eric Snow
On Wed, Dec 28, 2011 at 6:28 PM, Michael Foord fuzzy...@voidspace.org.uk wrote: Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:        

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-28 Thread Alex Gaynor
A few thoughts on this: a) This is not a new issue, I'm curious what the new interest is in it. b) Whatever the solution to this is, it is *not* CPython specific, any decision should be reflected in the Python language spec IMO, if CPython has the semantic that dicts aren't vulnerable to hash

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-28 Thread Raymond Hettinger
FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue. It is believed that they give us better-than-random results for commonly encountered datasets. A change to randomized hashes would have a negative performance impact on those cases. Also, randomizing the hash wreaks havoc on

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-28 Thread Christian Heimes
Am 29.12.2011 02:37, schrieb Jesse Noller: Back up link for the PDF: http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf Ocert disclosure: http://www.ocert.org/advisories/ocert-2011-003.html From http://www.nruns.com/_downloads/advisory28122011.pdf ---

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-28 Thread Christian Heimes
Am 29.12.2011 03:09, schrieb Raymond Hettinger: FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue. It is believed that they give us better-than-random results for commonly encountered datasets. A change to randomized hashes would have a negative performance impact on those

Re: [Python-Dev] Hash collision security issue (now public)

2011-12-28 Thread Brian Curtin
On Wed, Dec 28, 2011 at 19:51, Alex Gaynor alex.gay...@gmail.com wrote: A few thoughts on this: a) This is not a new issue, I'm curious what the new interest is in it. Well they (the presenters of the report) had to be accepted to that conference for *something*, otherwise we wouldn't know

<    1   2