Re: [Python-Dev] Hash collision security issue (now public)

2012-01-09 Thread Stefan Behnel
Jim Jewett, 08.01.2012 23:33:
 Stefan Behnel wrote:
 Admittedly, this may require some adaptation for the PEP393 unicode memory
 layout in order to produce identical hashes for all three representations
 if they represent the same content.
 
 They SHOULD NOT represent the same content; comparing two strings
 currently requires converting them to canonical form, which means the
 smallest format (of those three) that works.
 [...]
 That said, I don't think smallest-format is actually enforced with
 anything stronger than comments (such as in unicodeobject.h struct
 PyASCIIObject) and asserts (mostly calling
 _PyUnicode_CheckConsistency).

That's what I meant. AFAIR, the PEP393 discussions at some point brought up
the suspicion that third party code may end up generating Unicode strings
that do not comply with that invariant. So internal code shouldn't
strictly rely on it when it deals with user provided data. One example is
the unequal kinds optimisation in equality comparison, which, if I'm not
mistaken, wasn't implemented, due to exactly this reasoning. The same
applies to hashing then.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-09 Thread Victor Stinner
 That said, I don't think smallest-format is actually enforced with
 anything stronger than comments (such as in unicodeobject.h struct
 PyASCIIObject) and asserts (mostly calling
 _PyUnicode_CheckConsistency).  I don't have any insight on how
 prevalent non-conforming strings will be in practice, or whether
 supporting their equality will be required as a bugfix.

If you are only Python, you cannot create a string in a non canonical form.

If you use the C API, you can create a string in a non canonical form
using PyUnicode_New() + PyUnicode_WRITE, or
PyUnicode_FromUnicode(NULL, length) (or
PyUnicode_FromStringAndSize(NULL, length)) + direct access to the
Py_UNICODE* string. If you create strings in a non canonical form, it
is a bug in your application and Python doesn't help you. But how
could Python help you? Expose a function to check your newly creating
string? There is already _PyUnicode_CheckConsistency() which is slow
(O(n)) because it checks each character, it is only used in debug
mode.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-08 Thread Brian Curtin
On Sun, Jan 8, 2012 at 16:33, Jim Jewett jimjjew...@gmail.com wrote:
 In http://mail.python.org/pipermail/python-dev/2012-January/115368.html
 Stefan Behnel wrote:

Can you please configure your mail client to not create new threads
like this? As if this topic wasn't already hard enough to follow, it
now exists across handfuls of threads with the same title.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-07 Thread Stefan Behnel
Christian Heimes, 31.12.2011 04:59:
 Am 31.12.2011 03:22, schrieb Victor Stinner:
 The unique structure of CPython's dict implementation makes it harder to
 get the number of values with equal hash. The academic hash map (the one
 I learnt about at university) uses a bucket to store all elements with
 equal hash (more precise hash: mod mask). However Python's dict however
 perturbs the hash until it finds a free slot its array. The second,
 third ... collision can be caused by a legit and completely different
 (!) hash.
 
 The last choice is to change the hash algorithm. The *idea* is the same 
 than adding salt to hashed password (in practice it will be a little bit 
 different): if a pseudo-random salt is added, the attacker cannot 
 prepare a single dataset, he/she will have to regenerate a new dataset 
 for each possible salt value. If the salt is big enough (size in bits), 
 the attacker will need too much CPU to generate the dataset (compute N 
 keys with the same hash value). Basically, it slows down the attack by 
 2^(size of the salt).
 
 That's the idea of randomized hashing functions as implemented by Ruby
 1.8, Perl and others. The random seed is used as IV. Multiple rounds of
 multiply, XOR and MOD (integer overflows) cause a deviation. In your
 other posting you were worried about the performance implication. A
 randomized hash function just adds a single ADD operation, that's all.
 
 Downside: With randomization all hashes are unpredictable and change
 after every restart of the interpreter. This has some subtle side
 effects like a different outcome of {a:1, b:1, c:1}.keys() after a
 restart of the interpreter.
 
 Another possibility would be to replace our fast hash function by a 
 better hash function like MD5 or SHA1 (so the creation of the dataset 
 would be too slow in practice = too expensive), but cryptographic hash 
 functions are much slower (and so would slow down Python too much).
 
 I agree with your analysis. Cryptographic hash functions are far too
 slow for our use case. During my research I found another hash function
 that claims to be fast and that may not be vulnerable to this kind of
 attack: http://isthe.com/chongo/tech/comp/fnv/

Wouldn't Bob Jenkins' lookup3 hash function fit in here? After all, it's
portable, known to provide a very good distribution for different string
values and is generally fast on both 32 and 64 bit architectures.

http://burtleburtle.net/bob/c/lookup3.c

The analysis is here:

http://burtleburtle.net/bob/hash/doobs.html

It seems that there's also support for generating 64bit hash values
(actually 2x32bits) efficiently.

Admittedly, this may require some adaptation for the PEP393 unicode memory
layout in order to produce identical hashes for all three representations
if they represent the same content. So it's not a drop-in replacement.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-07 Thread Christian Heimes
Am 07.01.2012 12:02, schrieb Stefan Behnel:
 Wouldn't Bob Jenkins' lookup3 hash function fit in here? After all, it's
 portable, known to provide a very good distribution for different string
 values and is generally fast on both 32 and 64 bit architectures.
 
 http://burtleburtle.net/bob/c/lookup3.c
 
 The analysis is here:
 
 http://burtleburtle.net/bob/hash/doobs.html
 
 It seems that there's also support for generating 64bit hash values
 (actually 2x32bits) efficiently.

This thread as well as the ticket is getting so long that people barely
have a chance to catch up ...

Guido has stated that he doesn't want a completely new hash algorithm
for Python 2.x to 3.2. A new hash algorithm for 3.3 needs a PEP, too.

I've done some experiments with FNV and Murmur3. With Murmur3 128bit
I've seen some minor speed improvements on 64bit platforms. At first I
was surprised but it makes sense. Murmur3 operates on uint32_t blocks
while Python's hash algorithm iterates over 1 byte (bytes, ASCII), 2
bytes (USC2) or 4 bytes (USC4) types. Since most strings are either
ASCII or UCS2, the inner loop of the current algorithm is more tight.


 Admittedly, this may require some adaptation for the PEP393 unicode memory
 layout in order to produce identical hashes for all three representations
 if they represent the same content. So it's not a drop-in replacement.

Is this condition required and implemented at the moment?

Christian


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-07 Thread Terry Reedy

On 1/7/2012 12:57 PM, Christian Heimes wrote:

Am 07.01.2012 12:02, schrieb Stefan Behnel:



Admittedly, this may require some adaptation for the PEP393 unicode memory
layout in order to produce identical hashes for all three representations
if they represent the same content. So it's not a drop-in replacement.


Is this condition required and implemented at the moment?


If o1 == o2, then hash(o1) == hash(o2) is an unstated requirement 
implied by They [hash values] are used to quickly compare dictionary 
keys during a dictionary lookup. since hash(o1) != hash(o2) is taken to 
mean o1 != o2 (whereas hash(o1) == hash(o2) is taken to mean o1 == o2 is 
possible but must be checked). Hashing should be a coarsening of == as 
an equivalence relationship.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-06 Thread Mark Shannon

Serhiy Storchaka wrote:

06.01.12 02:10, Nick Coghlan написав(ла):

Not a good idea - a lot of the 3rd party tests that depend on dict
ordering are going to be using those modules anyway, so scattering our
solution across half the standard library is needlessly creating
additional work without really reducing the incompatibility problem.
If we're going to change anything, it may as well be the string
hashing algorithm itself.


Changing the string hashing algorithm will hit the general performance 
and also will break down any code that depend on dict ordering. 
Specialized dict slow down only needed parts of some applications.


The minimal proposed change of seeding the hash from a global value (a 
single memory read and an addition) will have such a minimal performance 
effect that it will be undetectable even on the most noise-free testing 
environment.


Cheers,
Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-06 Thread Steven D'Aprano

Glenn Linderman wrote:

On 1/5/2012 5:52 PM, Steven D'Aprano wrote:


At some point, presuming that there is no speed penalty, the behaviour 
will surely become not just enabled by default but mandatory. Python 
has never promised that hashes must be predictable or consistent, so 
apart from backwards compatibility concerns for old versions, future 
versions of Python should make it mandatory. Presuming that there is 
no speed penalty, I'd argue in favour of making it mandatory for 3.3. 
Why do we need a flag for something that is going to be always on? 


I think the whole paragraph is invalid, because it presumes there is no 
speed penalty.  I presume there will be a speed penalty, until 
benchmarking shows otherwise.


There *may* be a speed penalty, but I draw your attention to Paul McMillian's 
email on 1st of January:


Empirical testing shows that this unoptimized python
implementation produces ~10% slowdown in the hashing of
~20 character strings.

and Christian Heimes' email on 3rd of January:

The changeset adds the murmur3 hash algorithm with some
minor changes, for example more random seeds. At first I
was worried that murmur might be slower than our old hash
algorithm. But in fact it seems to be faster!


So I think that it's a fairly safe bet that there will be a solution that is 
as fast, or at worst, trivially slower, than the current hash function. But of 
course, benchmarks will be needed.




--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-06 Thread Victor Stinner
Using my patch (random-2.patch), the overhead is 0%. I cannot see a
difference with and without my patch.

Numbers:
---
unpatched:
== 3 characters ==
1 loops, best of 3: 459 usec per loop
== 10 characters ==
1 loops, best of 3: 575 usec per loop
== 500 characters ==
1 loops, best of 3: 1.36 msec per loop

patched:

== 3 characters ==
1 loops, best of 3: 458 usec per loop
== 10 characters ==
1 loops, best of 3: 575 usec per loop
== 500 characters ==
1 loops, best of 3: 1.36 msec per loop
---
(the patched version looks faster just because the timer is not
reliable enough for such fast test)

Script:
---
echo == 3 characters ==
./python -m timeit -n 1 -s 'text=((%03i % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=((%03i % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=((%03i % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'

echo == 10 characters ==
./python -m timeit -n 1 -s 'text=((%010i % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=((%010i % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=((%010i % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'

echo == 500 characters ==
./python -m timeit -n 1 -s 'text=((%0500i % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=((%0500i % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=((%0500i % x)
---
(Take the smallest timing for each test)

-n 1 is needed because the hash value is only computed once (is
cached). I may be possible to have more reliable results by disabling
completly the hash cache (comment PyUnicode_HASH(self) = x; line).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Hash collision security issue (now public)

2012-01-06 Thread Jim Jewett
In http://mail.python.org/pipermail/python-dev/2012-January/115350.html,
Mark Shannon wrote:

 The minimal proposed change of seeding the hash from a global value (a
 single memory read and an addition) will have such a minimal performance
 effect that it will be undetectable even on the most noise-free testing
 environment.

(1)  Is it established that this (a single initial add, with no
per-loop operations) would be sufficient?

I thought that was in the gray area of We don't yet have a known
attack, but there are clearly safer options.

(2)  Even if the direct cost (fetch and add) were free, it might be
expensive in practice.  The current hash function is designed to send
similar strings (and similar numbers) to similar hashes.

(2a)  That guarantees they won't (initially) collide, even in very small dicts.
(2b)  It keeps them nearby, which has an effect on cache hits.   The
exact effect (and even direction) would of course depend on the
workload, which makes me distrust micro-benchmarks.

If this were a problem in practice, I could understand accepting a
little slowdown as the price of safety, but ... it isn't.  Even in
theory, the only way to trigger this is to take unreasonable amounts
of user input and turn it directly into an unreasonable number of keys
(as opposed to values, or list elements) placed in the same dict (as
opposed to a series of smaller dicts).

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-06 Thread Mark Shannon

Hi,

It seems to me that half the folk discussing this issue want a 
super-strong, resist-all-hypothetical-attacks hash with little regard to 
performance. The other half want no change or a change that will have no 
 observable effect. (I may be exaggerating a little.)


Can I propose the following, half-way proposal:

1. Since there is a published vulnerability,
that we fix it with the most efficient solution proposed so far:
http://bugs.python.org/file24143/random-2.patch

2. Decide which versions of Python this should be applied to.
3.3 seems a given, the other are open to debate.

3. If and only if (and I think this unlikely) the solution chosen is 
shown to be vulnerable to a more sophisticated attack then a new issue 
should be opened and dealt with separately.


Cheers,
Mark.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-06 Thread Paul Moore
On 6 January 2012 20:25, Mark Shannon m...@hotpy.org wrote:
 Hi,

 It seems to me that half the folk discussing this issue want a super-strong,
 resist-all-hypothetical-attacks hash with little regard to performance. The
 other half want no change or a change that will have no  observable effect.
 (I may be exaggerating a little.)

 Can I propose the following, half-way proposal:

 1. Since there is a published vulnerability,
 that we fix it with the most efficient solution proposed so far:
 http://bugs.python.org/file24143/random-2.patch

 2. Decide which versions of Python this should be applied to.
 3.3 seems a given, the other are open to debate.

 3. If and only if (and I think this unlikely) the solution chosen is shown
 to be vulnerable to a more sophisticated attack then a new issue should be
 opened and dealt with separately.

+1

Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-06 Thread Glenn Linderman

On 1/5/2012 4:10 PM, Nick Coghlan wrote:

On Fri, Jan 6, 2012 at 8:15 AM, Serhiy Storchakastorch...@gmail.com  wrote:

05.01.12 21:14, Glenn Linderman написав(ла):

So, fixing the vulnerable packages could be a sufficient response,
rather than changing the hash function.  How to fix?  Each of those
above allocates and returns a dict.  Simply have each of those allocate
and return and wrapped dict, which has the following behaviors:

i) during __init__, create a local, random, string.
ii) for all key values, prepend the string, before passing it to the
internal dict.


Good idea.


Thanks for the implementation, Serhiy.  That is the sort of thing I had 
in mind, indeed.

Not a good idea - a lot of the 3rd party tests that depend on dict
ordering are going to be using those modules anyway,


Stats? Didn't someone post a list of tests  that fail when changing the 
hash? Oh, those were stdlib tests, not 3rd party tests.  I'm not sure 
how to gather the stats, then, are you?



so scattering our
solution across half the standard library is needlessly creating
additional work without really reducing the incompatibility problem.


Half the standard library?  no one has cared to augment my list of 
modules, but I have seen reference to JSON in addition to cgi and 
urllib.parse.  I think there are more than 6 modules in the standard 
library...



If we're going to change anything, it may as well be the string
hashing algorithm itself.


Changing the string hashing algorithm is known (or at least no one has 
argued otherwise) to be a source of backward incompatibility that will 
break programs.  My proposal (and Serhiy's implementation, assuming it 
works, or can be easily tweaked to work, I haven't reviewed it in detail 
or attempted to test it) will only break programs that have vulnerabilities.



I failed to mention one other benefit of my proposal: every web request 
would have a different random prefix, so attempting to gather info is 
futile: the next request has a different random prefix, so different 
strings would collide.



Cheers,
Nick.

Indeed it is nice when we can be cheery even when arguing, for the most 
part :)  I've enjoyed reading the discussions in this forum because most 
folks have respect for other people's opinions, even when they differ.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Antoine Pitrou
On Thu, 5 Jan 2012 15:26:27 +1100
Andrew Bennetts and...@bemusement.org wrote:
 
 I don't think that's news either.
 http://mail.python.org/pipermail/python-dev/2003-May/035907.html and
 http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for
 instance show that in 2003 it was clearly known to at least be likely to be an
 exploitable DoS in common code (a dict of HTTP headers or HTTP form keys).
 
 There was debate about whether it's the language's responsibility to mitigate
 the problem or if apps should use safer designs for handling untrusted input
 (e.g. limit the number of keys input is allowed to create, or use something
 other than dicts), and debate about just how practical an effective exploit
 would be.  But I think it was understood to be a real concern 8 years ago, so
 not exactly sudden.

That's not news indeed, but that doesn't make it less of a problem,
especially now that the issue has been widely publicized through a
conference and announcements on several widely-read Web sites.

That said, only doing the security fix in 3.3 would have the nice side
effect of pushing people towards Python 3, so perhaps I'm for it after
all.

Half-jokingly,

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Maciej Fijalkowski
On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou solip...@pitrou.net wrote:
 On Thu, 5 Jan 2012 15:26:27 +1100
 Andrew Bennetts and...@bemusement.org wrote:

 I don't think that's news either.
 http://mail.python.org/pipermail/python-dev/2003-May/035907.html and
 http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for
 instance show that in 2003 it was clearly known to at least be likely to be 
 an
 exploitable DoS in common code (a dict of HTTP headers or HTTP form keys).

 There was debate about whether it's the language's responsibility to mitigate
 the problem or if apps should use safer designs for handling untrusted input
 (e.g. limit the number of keys input is allowed to create, or use something
 other than dicts), and debate about just how practical an effective exploit
 would be.  But I think it was understood to be a real concern 8 years ago, so
 not exactly sudden.

 That's not news indeed, but that doesn't make it less of a problem,
 especially now that the issue has been widely publicized through a
 conference and announcements on several widely-read Web sites.

 That said, only doing the security fix in 3.3 would have the nice side
 effect of pushing people towards Python 3, so perhaps I'm for it after
 all.

 Half-jokingly,

 Antoine.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

Just to make things clear - stdlib itself has 1/64 of tests relying on
dict order. Changing dict order in *older* pythons will break
everyone's tests and some peoples code. Making this new 2.6.x release
would mean that people using new python 2.6 would have to upgrade an
unspecified amount of their python packages, that does not sound very
cool. Also consider that new 2.6.x would go as a security fix to old
ubuntu, but all other packages won't, because they'll not contain
security fixes. Just so you know

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Glenn Linderman

On 1/5/2012 9:34 AM, Maciej Fijalkowski wrote:

Also consider that new 2.6.x would go as a security fix to old
ubuntu, but all other packages won't, because they'll not contain
security fixes. Just so you know


Why should CPython by constrained by broken policies of Ubuntu?  If the 
other packages must be fixed so they work correctly with a security fix 
in Python, then they should be considered as containing a security fix. 
If they aren't, then that is a broken policy.


On the other hand, it is very true that the seductive convenience of 
dict (readily available, good performance) in normal cases have created 
the vulnerability because its characteristics are a function of the data 
inserted, and when used for data that is from unknown, possibly 
malicious sources, that is a bug in the program that uses dict, not in 
dict itself.


So it seems to me that:

1) the security problem is not in CPython, but rather in web servers 
that use dict inappropriately.
2) changing CPython in a way that breaks code is not a security fix to 
CPython, but rather a gratuitous breakage of compatibility promises, 
wrapped in a security-fix lie.


The problem for CPython here can be summarized as follows:

a) it is being blamed for problems in web servers that are not problems 
in CPython
b) perhaps dict documentation is a bit too seductive, in not declaring 
that data from malicious sources could cause its performance to degrade 
significantly (but then, any programmer who has actually taken a decent 
set of programming classes should understand that, but on the other 
hand, there are programmers who have not taken such classes).
c) CPython provides no other mapping data structures that rival the 
performance and capabilities of dict as an alternative, nor can such 
data structures be written in CPython, as the performance of dict comes 
not only from hashing, but also from being written in C.


The solutions could be:

A) push back on the blame: it is not a CPython problem
B) perhaps add a warning to the documentation for the naïve, untrained 
programmers
C) consider adding an additional data structure to the language, and 
mention it in the B warning for versions 3.3+.


On the other hand, the web server vulnerability could be blamed on 
CPython in another way:


identify vulnerable packages in the stdlib that are likely the be used 
during the parsing of user-supplied data.  Ones that come to mind 
(Python 3.2) are:
urllib.parse (various parse* functions)  (package names different in 
Python 2.x)

cgi (parse_multipart, FieldStorage)

So, fixing the vulnerable packages could be a sufficient response, 
rather than changing the hash function.  How to fix?  Each of those 
above allocates and returns a dict.  Simply have each of those allocate 
and return and wrapped dict, which has the following behaviors:


i) during __init__, create a local, random, string.
ii) for all key values, prepend the string, before passing it to the 
internal dict.


Changing these vulnerable packages rather than the hash function is a 
much more constrained change, and wouldn't create bugs in programs that 
erroneously depend on the current hash function directly or indirectly.


This would not fix web servers that use their own parsing and storage 
mechanism for FORM fields, if they have also inappropriately used a 
dict as their storage mechanism for user supplied data.  However, a 
similar solution could be similarly applied by the authors of those web 
servers, and would be a security fix to such packages, so should be 
applied to Ubuntu, if available there, or other systems with 
security-only fix acceptance.


This solution does not require changes to the hash, does not require a 
cryptographicly secure hash, and does not require code to be added to 
the initialization of Python before normal objects and mappings can be 
created.


If a port doesn't contain a good random number generator, a weak one can 
be subsitituted, but such decisions can be made in Python code after the 
interpreter is initialized, and use of stdlib packages is available.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Antoine Pitrou
On Thu, 5 Jan 2012 19:34:13 +0200
Maciej Fijalkowski fij...@gmail.com wrote:
 
 Just to make things clear - stdlib itself has 1/64 of tests relying on
 dict order. Changing dict order in *older* pythons will break
 everyone's tests and some peoples code.

Breaking tests is not a problem: they are typically not run by
production code and so people can take the time to fix them.

Breaking other code is a problem if it is legitimate. Relying on dict
ordering is totally wrong and I don't think we should care about such
cases. The only issue is when relying on hash() being stable accross
runs. But hashing already varies from build to build (32-bit vs.
64-bit) and I think that anyone seriously relying on it should already
have been bitten.

 Making this new 2.6.x release
 would mean that people using new python 2.6 would have to upgrade an
 unspecified amount of their python packages, that does not sound very
 cool.

How about 2.7? Do you think it should also remain untouched?
I am ok for leaving 2.6 alone (that's Barry's call anyway) but 2.7 is
another matter - should people migrate to 3.x to get the security fix?

As for 3.2, it should certainly get the fix IMO. There are not many
Python 3 legacy applications relying on hash() stability, I think.

 Also consider that new 2.6.x would go as a security fix to old
 ubuntu, but all other packages won't, because they'll not contain
 security fixes.

Ubuntu can decide *not* to ship the fix if they prefer it like that.
Their policies and decisions, though, should not taint ours.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread David Malcolm
On Thu, 2012-01-05 at 19:34 +0200, Maciej Fijalkowski wrote:
 On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou solip...@pitrou.net wrote:
  On Thu, 5 Jan 2012 15:26:27 +1100
  Andrew Bennetts and...@bemusement.org wrote:
 
  I don't think that's news either.
  http://mail.python.org/pipermail/python-dev/2003-May/035907.html and
  http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for
  instance show that in 2003 it was clearly known to at least be likely to 
  be an
  exploitable DoS in common code (a dict of HTTP headers or HTTP form keys).
 
  There was debate about whether it's the language's responsibility to 
  mitigate
  the problem or if apps should use safer designs for handling untrusted 
  input
  (e.g. limit the number of keys input is allowed to create, or use something
  other than dicts), and debate about just how practical an effective exploit
  would be.  But I think it was understood to be a real concern 8 years ago, 
  so
  not exactly sudden.
 
  That's not news indeed, but that doesn't make it less of a problem,
  especially now that the issue has been widely publicized through a
  conference and announcements on several widely-read Web sites.
 
  That said, only doing the security fix in 3.3 would have the nice side
  effect of pushing people towards Python 3, so perhaps I'm for it after
  all.
 
  Half-jokingly,
 
  Antoine.

 
 Just to make things clear - stdlib itself has 1/64 of tests relying on
 dict order. Changing dict order in *older* pythons will break
 everyone's tests and some peoples code. Making this new 2.6.x release
 would mean that people using new python 2.6 would have to upgrade an
 unspecified amount of their python packages, that does not sound very
 cool. Also consider that new 2.6.x would go as a security fix to old
 ubuntu, but all other packages won't, because they'll not contain
 security fixes. Just so you know

We have similar issues in RHEL, with the Python versions going much
further back (e.g. 2.3)

When backporting the fix to ancient python versions, I'm inclined to
turn the change *off* by default, requiring the change to be enabled via
an environment variable: I want to avoid breaking existing code, even if
such code is technically relying on non-guaranteed behavior.  But we
could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
That way /usr/bin/python would default to the old behavior, but web apps
would have some protection.   Any such logic here also suggests the need
for an attribute in the sys module so that you can verify the behavior.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/05/2012 02:14 PM, Glenn Linderman wrote:
 1) the security problem is not in CPython, but rather in web servers 
 that use dict inappropriately.

Most webapp vulnerabilities are due to their use of Python's cgi module,
which it uses a dict to hold the form / query string data being supplied
by untrusted external users.



Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8F/uEACgkQ+gerLs4ltQ679QCgqKPYYwEetKR3bEMVh5eukLin
cA8An3XJMYWhK5MutjbOCxCfYzKXmDzc
=V3lh
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Glenn Linderman

On 1/5/2012 11:49 AM, Tres Seaver wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/05/2012 02:14 PM, Glenn Linderman wrote:

1) the security problem is not in CPython, but rather in web servers
that use dict inappropriately.

Most webapp vulnerabilities are due to their use of Python's cgi module,
which it uses a dict to hold the form / query string data being supplied
by untrusted external users.


Yes, I understand that (and have some such web apps in production).

In fact, I pointed out urllib.parse and cgi as specific modules for 
which a proposed fix could be made without impacting the Python hash 
function.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Ethan Furman

Tres Seaver wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/05/2012 02:14 PM, Glenn Linderman wrote:
1) the security problem is not in CPython, but rather in web servers 
that use dict inappropriately.


Most webapp vulnerabilities are due to their use of Python's cgi module,
which it uses a dict to hold the form / query string data being supplied
by untrusted external users.


And Glenn suggested further down that an appropriate course of action 
would be to fix the cgi module (and others) instead of messing with dict.


~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Paul Moore
On 5 January 2012 19:33, David Malcolm dmalc...@redhat.com wrote:
 We have similar issues in RHEL, with the Python versions going much
 further back (e.g. 2.3)

 When backporting the fix to ancient python versions, I'm inclined to
 turn the change *off* by default, requiring the change to be enabled via
 an environment variable: I want to avoid breaking existing code, even if
 such code is technically relying on non-guaranteed behavior.  But we
 could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
 That way /usr/bin/python would default to the old behavior, but web apps
 would have some protection.   Any such logic here also suggests the need
 for an attribute in the sys module so that you can verify the behavior.

Uh, surely no-one is suggesting backporting to ancient versions? I
couldn't find the statement quickly on the python.org website (so this
is via google), but isn't it true that 2.6 is in security-only mode
and 2.5 and earlier will never get the fix? Having a source-only
release for 2.6 means the fix is off by default in the sense that
you can choose not to build it. Or add a #ifdef to the source if it
really matters.

Personally, I find it hard to see this as a Python security hole, but
I can sympathise with the idea that it would be nice to make dict
safer by default. (Although the benefit for me personally would be
zero, so I'm reluctant for the change to have a detectable cost...)

My feeling is that it should go into 2.7, 3.2, and 3.3+, but with no
bells and whistles to switch it off or the like. If it's not suitable
to go in on that basis, restrict it to 3.3+ (where it's certainly OK)
and advise users of earlier versions to either upgrade or code
defensively to avoid hitting the pathological case. Surely that sort
of defensive code should be second nature to the people who might be
affected by the issue?

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Barry Warsaw
On Jan 05, 2012, at 02:33 PM, David Malcolm wrote:

We have similar issues in RHEL, with the Python versions going much
further back (e.g. 2.3)

When backporting the fix to ancient python versions, I'm inclined to
turn the change *off* by default, requiring the change to be enabled via
an environment variable: I want to avoid breaking existing code, even if
such code is technically relying on non-guaranteed behavior.  But we
could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
That way /usr/bin/python would default to the old behavior, but web apps
would have some protection.

This sounds like a reasonable compromise for all stable Python releases.  It
can be turned on by default for Python 3.3.  If you also make the default
setting easy to change (i.e. parameterized in one place), then distros can
make their own decision about the default, although I'd argue for the above
default approach for Debian/Ubuntu.

Any such logic here also suggests the need for an attribute in the sys module
so that you can verify the behavior.

That would be read-only though, right?

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Barry Warsaw
On Jan 05, 2012, at 08:35 PM, Paul Moore wrote:

Uh, surely no-one is suggesting backporting to ancient versions? I
couldn't find the statement quickly on the python.org website (so this
is via google), but isn't it true that 2.6 is in security-only mode
and 2.5 and earlier will never get the fix? Having a source-only
release for 2.6 means the fix is off by default in the sense that
you can choose not to build it. Or add a #ifdef to the source if it
really matters.

Correct, although there's no reason why a patch for versions older than 2.6
couldn't be included on a python.org security page for reference in CVE or
other security notifications.  Distros that care about versions older than
Python 2.6 will basically be back-porting the patch anyway.

My feeling is that it should go into 2.7, 3.2, and 3.3+, but with no
bells and whistles to switch it off or the like.

I like David Malcolm's suggestion, but I have no problem applying it to 3.3,
enabled by default with no way to turn it off.  The off-by-default on-switch
policy for stable releases would be justified by maximum backward
compatibility conservativeness.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Toshio Kuratomi
On Thu, Jan 05, 2012 at 08:35:57PM +, Paul Moore wrote:
 On 5 January 2012 19:33, David Malcolm dmalc...@redhat.com wrote:
  We have similar issues in RHEL, with the Python versions going much
  further back (e.g. 2.3)
 
  When backporting the fix to ancient python versions, I'm inclined to
  turn the change *off* by default, requiring the change to be enabled via
  an environment variable: I want to avoid breaking existing code, even if
  such code is technically relying on non-guaranteed behavior.  But we
  could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
  That way /usr/bin/python would default to the old behavior, but web apps
  would have some protection.   Any such logic here also suggests the need
  for an attribute in the sys module so that you can verify the behavior.
 
 Uh, surely no-one is suggesting backporting to ancient versions? I
 couldn't find the statement quickly on the python.org website (so this
 is via google), but isn't it true that 2.6 is in security-only mode
 and 2.5 and earlier will never get the fix?

I think when dmalcolm says backporting he means that he'll have to
backport the fix from modern, supported-by-python.org python to the ancient
python's that he's supporting as part of the Linux distributions where he's
the python package maintainer.

I'm thinking he's mentioning it here mainly to see if someone thinks that
his approach for those distributions causes anyone to point out a reason not
to diverge from upstream in that manner.

 Having a source-only
 release for 2.6 means the fix is off by default in the sense that
 you can choose not to build it. Or add a #ifdef to the source if it
 really matters.
 
I don't think that this would satisfy dmalcolm's needs.  What he's talking
about sounds more like a runtime switch (possibly only when initializing,
though, not on-the-fly).

-Toshio


pgp7qk95cGJ9b.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread David Malcolm
On Thu, 2012-01-05 at 20:35 +, Paul Moore wrote:
 On 5 January 2012 19:33, David Malcolm dmalc...@redhat.com wrote:
  We have similar issues in RHEL, with the Python versions going much
  further back (e.g. 2.3)
 
  When backporting the fix to ancient python versions, I'm inclined to
  turn the change *off* by default, requiring the change to be enabled via
  an environment variable: I want to avoid breaking existing code, even if
  such code is technically relying on non-guaranteed behavior.  But we
  could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
  That way /usr/bin/python would default to the old behavior, but web apps
  would have some protection.   Any such logic here also suggests the need
  for an attribute in the sys module so that you can verify the behavior.
 
 Uh, surely no-one is suggesting backporting to ancient versions? I
 couldn't find the statement quickly on the python.org website (so this
 is via google), but isn't it true that 2.6 is in security-only mode
 and 2.5 and earlier will never get the fix? Having a source-only
 release for 2.6 means the fix is off by default in the sense that
 you can choose not to build it. Or add a #ifdef to the source if it
 really matters.
Sorry, if I was unclear.   I don't expect python-dev to do this
backporting, but those of us who do maintain such ancient pythons via
Linux distributions may want to do the backport for our users.  My email
was to note that it may make sense to pick more conservative defaults
for such a scenario, as compared to 2.6 onwards.

[snip]

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Georg Brandl
On 01/05/2012 09:45 PM, Barry Warsaw wrote:
 On Jan 05, 2012, at 02:33 PM, David Malcolm wrote:
 
We have similar issues in RHEL, with the Python versions going much
further back (e.g. 2.3)

When backporting the fix to ancient python versions, I'm inclined to
turn the change *off* by default, requiring the change to be enabled via
an environment variable: I want to avoid breaking existing code, even if
such code is technically relying on non-guaranteed behavior.  But we
could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
That way /usr/bin/python would default to the old behavior, but web apps
would have some protection.
 
 This sounds like a reasonable compromise for all stable Python releases.  It
 can be turned on by default for Python 3.3.  If you also make the default
 setting easy to change (i.e. parameterized in one place), then distros can
 make their own decision about the default, although I'd argue for the above
 default approach for Debian/Ubuntu.

Agreed.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Christian Heimes
Am 05.01.2012 21:45, schrieb Barry Warsaw:
 This sounds like a reasonable compromise for all stable Python releases.  It
 can be turned on by default for Python 3.3.  If you also make the default
 setting easy to change (i.e. parameterized in one place), then distros can
 make their own decision about the default, although I'd argue for the above
 default approach for Debian/Ubuntu.

Hey Barry, stop stealing my ideas! :) I've argued for these default
settings for days.

ver deliveryrandomized hashing
==
2.3 patch   disabled by default
2.4 patch   disabled
2.5 patch   disabled
2.6 release disabled
2.7 release disabled
3.0 ignore? disabled
3.1 release disabled
3.2 release disabled
3.3 n/a yet enabled by default

2.3 to 2.5 are still used in production (RHEL, Ubuntu LTS). Guido has
stated that he needs a patch for 2.4, too. I think we may safely ignore
Python 3.0. Nobody should use Python 3.0 on a production system.

I've suggested the env var PYRANDOMHASH. It's easy to set env vars in
Apache. For example Debian/Ubuntu has /etc/apache2/envvars.

Settings for PYRANDOMHASH:

 PYRANDOMHASH=1
   enable randomized hashing function

 PYRANDOMHASH=/path/to/seed
   enable randomized hashing function and read seed from 'seed'

 PYRANDOMHASH=0
   disable randomed hashing function

Since there isn't an easy way to set env vars in a shebang line since
something like

  #!/usr/bin/env PYRANDOMHASH=1 python2.7

doesn't work, we could come up with a solution the shebang.


IMHO the setting for the default setting should be a compile time
option. It's reasonable easy to extend the configure script to support
--enable-randomhash / --disable-randomhash. The MS VC build scripts can
grow a flag, too.


I still think that the topic needs a PEP. A couple of days ago I started
with a PEP. But Guido told me that he doesn't see a point in a PEP
because he prefers a small and quick solution, so I stopped working on
it. However the arguments, worries and ideas in this enormous topic have
repeated over and over. We know from experience that a PEP is a great
way to explain the how, what and why of the change as well as the paths
we didn't take.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Christian Heimes
Am 05.01.2012 21:10, schrieb Ethan Furman:
 Tres Seaver wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 01/05/2012 02:14 PM, Glenn Linderman wrote:
 1) the security problem is not in CPython, but rather in web servers
 that use dict inappropriately.

 Most webapp vulnerabilities are due to their use of Python's cgi module,
 which it uses a dict to hold the form / query string data being supplied
 by untrusted external users.
 
 And Glenn suggested further down that an appropriate course of action
 would be to fix the cgi module (and others) instead of messing with dict.

You'd have to fix any Python core module that may handle data from
untrusted sources. The issue isn't limited to web apps and POST
requests. It's possible to trigger the DoS from JSON, a malicious PDF,
JPEG's EXIF metadata or any other data.

Oh, and somebody has to fix all 3rd party modules, too.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Antoine Pitrou
On Thu, 05 Jan 2012 22:40:58 +0100
Christian Heimes li...@cheimes.de wrote:
 Am 05.01.2012 21:45, schrieb Barry Warsaw:
  This sounds like a reasonable compromise for all stable Python releases.  It
  can be turned on by default for Python 3.3.  If you also make the default
  setting easy to change (i.e. parameterized in one place), then distros can
  make their own decision about the default, although I'd argue for the above
  default approach for Debian/Ubuntu.
 
 Hey Barry, stop stealing my ideas! :) I've argued for these default
 settings for days.
 
 ver   deliveryrandomized hashing
 ==
 2.3   patch   disabled by default
 2.4   patch   disabled
 2.5   patch   disabled
 2.6   release disabled
 2.7   release disabled
 3.0   ignore? disabled
 3.1   release disabled
 3.2   release disabled
 3.3   n/a yet enabled by default

I don't think we (python-dev) are really concerned with 2.3, 2.4,
2.5 and 3.0.  They're all unsupported, and people do what they want
with their local source trees.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Christian Heimes
Am 05.01.2012 22:59, schrieb Antoine Pitrou:
 I don't think we (python-dev) are really concerned with 2.3, 2.4,
 2.5 and 3.0.  They're all unsupported, and people do what they want
 with their local source trees.

Let me reply with a quote from Barry:

 Correct, although there's no reason why a patch for versions
 older than 2.6 couldn't be included on a python.org security
 page for reference in CVE or other security notifications.
 Distros that care about versions older than Python 2.6 will
 basically be back-porting the patch anyway.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Serhiy Storchaka

05.01.12 21:14, Glenn Linderman написав(ла):

So, fixing the vulnerable packages could be a sufficient response,
rather than changing the hash function.  How to fix?  Each of those
above allocates and returns a dict.  Simply have each of those allocate
and return and wrapped dict, which has the following behaviors:

i) during __init__, create a local, random, string.
ii) for all key values, prepend the string, before passing it to the
internal dict.


Good idea.
# -*- coding: utf-8 -*-
from collections import MutableMapping
import random


class SafeDict(dict, MutableMapping):

def __init__(self, *args, **kwds):
dict.__init__(self)
self._prefix = str(random.getrandbits(64))
self.update(*args, **kwds)

def clear(self):
dict.clear(self)
self._prefix = str(random.getrandbits(64))

def _safe_key(self, key):
return self._prefix + repr(key), key

def __getitem__(self, key):
try:
return dict.__getitem__(self, self._safe_key(key))
except KeyError as e:
e.args = (key,)
raise e

def __setitem__(self, key, value):
dict.__setitem__(self, self._safe_key(key), value)

def __delitem__(self, key):
try:
dict.__delitem__(self, self._safe_key(key))
except KeyError as e:
e.args = (key,)
raise e

def __iter__(self):
for skey, key in dict.__iter__(self):
yield key

def __contains__(self, key):
return dict.__contains__(self, self._safe_key(key))

setdefault = MutableMapping.setdefault
update = MutableMapping.update
pop = MutableMapping.pop
popitem = MutableMapping.popitem
keys = MutableMapping.keys
values = MutableMapping.values
items = MutableMapping.items

def __repr__(self):
return '{%s}' % ', '.join('%s: %s' % (repr(k), repr(v))
for k, v in self.items())

def copy(self):
return self.__class__(self)

@classmethod
def fromkeys(cls, iterable, value=None):
d = cls()
for key in iterable:
d[key] = value
return d

def __eq__(self, other):
return all(k in other and other[k] == v for k, v in self.items()) and \
all(k in self and self[k] == v for k, v in other.items())

def __ne__(self, other):
return not self == other
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Nick Coghlan
On Fri, Jan 6, 2012 at 8:15 AM, Serhiy Storchaka storch...@gmail.com wrote:
 05.01.12 21:14, Glenn Linderman написав(ла):

 So, fixing the vulnerable packages could be a sufficient response,
 rather than changing the hash function.  How to fix?  Each of those
 above allocates and returns a dict.  Simply have each of those allocate
 and return and wrapped dict, which has the following behaviors:

 i) during __init__, create a local, random, string.
 ii) for all key values, prepend the string, before passing it to the
 internal dict.


 Good idea.

Not a good idea - a lot of the 3rd party tests that depend on dict
ordering are going to be using those modules anyway, so scattering our
solution across half the standard library is needlessly creating
additional work without really reducing the incompatibility problem.
If we're going to change anything, it may as well be the string
hashing algorithm itself.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Terry Reedy

On 1/5/2012 3:10 PM, Ethan Furman wrote:

Tres Seaver wrote:



1) the security problem is not in CPython, but rather in web servers
that use dict inappropriately.


Most webapp vulnerabilities are due to their use of Python's cgi module,
which it uses a dict to hold the form / query string data being supplied
by untrusted external users.


And Glenn suggested further down that an appropriate course of action
would be to fix the cgi module (and others) instead of messing with dict.


I think both should be done. For web applications, it would be best to 
reject DOS attempts with 'random' keys in O(1) time rather than in O(n) 
time even with improved hash. But some other apps, like the Python 
interpreter itself, 'random' names may be quite normal.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Steven D'Aprano

David Malcolm wrote:


When backporting the fix to ancient python versions, I'm inclined to
turn the change *off* by default, requiring the change to be enabled via
an environment variable: I want to avoid breaking existing code, even if
such code is technically relying on non-guaranteed behavior.  But we
could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
That way /usr/bin/python would default to the old behavior, but web apps
would have some protection.   Any such logic here also suggests the need
for an attribute in the sys module so that you can verify the behavior.


Surely the way to verify the behaviour is to run this from the shell:

python -c print(hash(abcde))

twice, and see that the calls return different values. (Or have I 
misunderstood the way the fix is going to work?)


In any case, I wouldn't want to rely on the presence of a flag in the sys 
module to verify the behaviour, I'd want to see for myself that hash 
collisions are no longer predictable.




--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Barry Warsaw
On Jan 05, 2012, at 10:40 PM, Christian Heimes wrote:

Hey Barry, stop stealing my ideas! :) I've argued for these default
settings for days.

:)

I've suggested the env var PYRANDOMHASH. It's easy to set env vars in
Apache. For example Debian/Ubuntu has /etc/apache2/envvars.

For consistency, it really should be PYTHONSOMETHING.  I personally don't care
how long it is (e.g. PYTHONIOENCODING).

Settings for PYRANDOMHASH:

 PYRANDOMHASH=1
   enable randomized hashing function

 PYRANDOMHASH=/path/to/seed
   enable randomized hashing function and read seed from 'seed'

 PYRANDOMHASH=0
   disable randomed hashing function

Why not PYTHONHASHSEED then?

Since there isn't an easy way to set env vars in a shebang line since
something like

  #!/usr/bin/env PYRANDOMHASH=1 python2.7

doesn't work, we could come up with a solution the shebang.

We have precedence for mirroring startup options and envars, so it doesn't
bother me to add such a switch to Python 3.3.  It *does* bother me to add a
switch to any stable release.

IMHO the setting for the default setting should be a compile time
option. It's reasonable easy to extend the configure script to support
--enable-randomhash / --disable-randomhash. The MS VC build scripts can
grow a flag, too.

I still think that the topic needs a PEP. A couple of days ago I started
with a PEP. But Guido told me that he doesn't see a point in a PEP
because he prefers a small and quick solution, so I stopped working on
it. However the arguments, worries and ideas in this enormous topic have
repeated over and over. We know from experience that a PEP is a great
way to explain the how, what and why of the change as well as the paths
we didn't take.

One way to look at it is to have a quick-and-dirty solution for stable
releases.  It could be suboptimal from a ui point of view because of backward
compatibility issues.  The PEP could then outline the boffo perfect solution
for Python 3.3, which a section on how it will be backported to stable
releases.

Cheers,
-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Nick Coghlan
On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano st...@pearwood.info wrote:
 Surely the way to verify the behaviour is to run this from the shell:

 python -c print(hash(abcde))

 twice, and see that the calls return different values. (Or have I
 misunderstood the way the fix is going to work?)

 In any case, I wouldn't want to rely on the presence of a flag in the sys
 module to verify the behaviour, I'd want to see for myself that hash
 collisions are no longer predictable.

More directly, you can just check that the hash of the empty string is non-zero.

So -1 for a flag in the sys module - hash('') != 0 should serve as a
sufficient check whether or not process-level string hash
randomisation is in effect.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Victor Stinner
2012/1/6 Barry Warsaw ba...@python.org:
Settings for PYRANDOMHASH:

 PYRANDOMHASH=1
   enable randomized hashing function

 PYRANDOMHASH=/path/to/seed
   enable randomized hashing function and read seed from 'seed'

 PYRANDOMHASH=0
   disable randomed hashing function

 Why not PYTHONHASHSEED then?

See my patch attached to the issue #13703? I prepared the code to be
able to set easily the hash seed (it has a LCG, it's seed can be
provided by the user directly). I agree that the value 0 should give
the same behaviour than the actual hash (disable the randomized hash).
I will add the variable in the next version of my patch.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Christian Heimes
Am 06.01.2012 01:34, schrieb Nick Coghlan:
 On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano st...@pearwood.info wrote:
 Surely the way to verify the behaviour is to run this from the shell:

 python -c print(hash(abcde))

 twice, and see that the calls return different values. (Or have I
 misunderstood the way the fix is going to work?)

 In any case, I wouldn't want to rely on the presence of a flag in the sys
 module to verify the behaviour, I'd want to see for myself that hash
 collisions are no longer predictable.
 
 More directly, you can just check that the hash of the empty string is 
 non-zero.
 
 So -1 for a flag in the sys module - hash('') != 0 should serve as a
 sufficient check whether or not process-level string hash
 randomisation is in effect.

This might not work as we have to special case empty strings and perhaps
\0 strings, too. Otherwise we would give away the random seed to an
attacker if an attacker can somehow get hold of hash('') or hash(n * '\0').

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Benjamin Peterson
2012/1/5 Nick Coghlan ncogh...@gmail.com:
 On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano st...@pearwood.info wrote:
 Surely the way to verify the behaviour is to run this from the shell:

 python -c print(hash(abcde))

 twice, and see that the calls return different values. (Or have I
 misunderstood the way the fix is going to work?)

 In any case, I wouldn't want to rely on the presence of a flag in the sys
 module to verify the behaviour, I'd want to see for myself that hash
 collisions are no longer predictable.

 More directly, you can just check that the hash of the empty string is 
 non-zero.

 So -1 for a flag in the sys module - hash('') != 0 should serve as a
 sufficient check whether or not process-level string hash
 randomisation is in effect.

What exactly is the disadvantage of a sys attribute? That would seem
preferable to an obscure incarnation like that.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Antoine Pitrou
On Fri, 06 Jan 2012 01:50:00 +0100
Christian Heimes li...@cheimes.de wrote:
 Am 06.01.2012 01:34, schrieb Nick Coghlan:
  On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano st...@pearwood.info 
  wrote:
  Surely the way to verify the behaviour is to run this from the shell:
 
  python -c print(hash(abcde))
 
  twice, and see that the calls return different values. (Or have I
  misunderstood the way the fix is going to work?)
 
  In any case, I wouldn't want to rely on the presence of a flag in the sys
  module to verify the behaviour, I'd want to see for myself that hash
  collisions are no longer predictable.
  
  More directly, you can just check that the hash of the empty string is 
  non-zero.
  
  So -1 for a flag in the sys module - hash('') != 0 should serve as a
  sufficient check whether or not process-level string hash
  randomisation is in effect.
 
 This might not work as we have to special case empty strings and perhaps
 \0 strings, too.

The special case value doesn't have to be zero. Make it age(Barry) for
example (which, I think, is still representable in a 32-bit integer!).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Nick Coghlan
On Fri, Jan 6, 2012 at 10:59 AM, Benjamin Peterson benja...@python.org wrote:
 What exactly is the disadvantage of a sys attribute? That would seem
 preferable to an obscure incarnation like that.

Adding sys attributes in maintenance (or security) releases makes me nervous.

However, Victor and Christian are right about the need for a special
case to avoid leaking information, so my particular suggested check
won't work.

The most robust check would be to run sys.executable in a subprocess
and check if it gives the same hash for a non-empty string as the
current process.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Steven D'Aprano

Benjamin Peterson wrote:

2012/1/5 Nick Coghlan ncogh...@gmail.com:

On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano st...@pearwood.info wrote:

Surely the way to verify the behaviour is to run this from the shell:

python -c print(hash(abcde))

twice, and see that the calls return different values. (Or have I
misunderstood the way the fix is going to work?)

In any case, I wouldn't want to rely on the presence of a flag in the sys
module to verify the behaviour, I'd want to see for myself that hash
collisions are no longer predictable.

More directly, you can just check that the hash of the empty string is non-zero.

So -1 for a flag in the sys module - hash('') != 0 should serve as a
sufficient check whether or not process-level string hash
randomisation is in effect.


What exactly is the disadvantage of a sys attribute? That would seem
preferable to an obscure incarnation like that.


There's nothing obscure about directly testing the hash. That's about as far 
from obscure as it is possible to get: you are directly testing the presence 
of a feature by testing the feature.


Relying on a flag to tell you whether hashes are randomised adds additional 
complexity: now you need to care about whether hashes are randomised AND know 
that there is a flag you can look up and what it is called.


And since the flag won't exist in all versions of Python, or even in all 
builds of a particular Python version, it isn't a matter of just testing the 
flag, but of doing the try...except or hasattr() dance to check whether it 
exists first.


At some point, presuming that there is no speed penalty, the behaviour will 
surely become not just enabled by default but mandatory. Python has never 
promised that hashes must be predictable or consistent, so apart from 
backwards compatibility concerns for old versions, future versions of Python 
should make it mandatory. Presuming that there is no speed penalty, I'd argue 
in favour of making it mandatory for 3.3. Why do we need a flag for something 
that is going to be always on?




--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Benjamin Peterson
2012/1/5 Steven D'Aprano st...@pearwood.info:
 Benjamin Peterson wrote:

 2012/1/5 Nick Coghlan ncogh...@gmail.com:

 On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano st...@pearwood.info
 wrote:

 Surely the way to verify the behaviour is to run this from the shell:

 python -c print(hash(abcde))

 twice, and see that the calls return different values. (Or have I
 misunderstood the way the fix is going to work?)

 In any case, I wouldn't want to rely on the presence of a flag in the
 sys
 module to verify the behaviour, I'd want to see for myself that hash
 collisions are no longer predictable.

 More directly, you can just check that the hash of the empty string is
 non-zero.

 So -1 for a flag in the sys module - hash('') != 0 should serve as a
 sufficient check whether or not process-level string hash
 randomisation is in effect.


 What exactly is the disadvantage of a sys attribute? That would seem
 preferable to an obscure incarnation like that.


 There's nothing obscure about directly testing the hash. That's about as far
 from obscure as it is possible to get: you are directly testing the presence
 of a feature by testing the feature.

It's obscure because hash('') != 0 doesn't necessarily mean the hashes
are randomized. A different hashing algorithm could be in effect.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Christian Heimes
Am 06.01.2012 03:04, schrieb Benjamin Peterson:
 It's obscure because hash('') != 0 doesn't necessarily mean the hashes
 are randomized. A different hashing algorithm could be in effect.

Also in 1 of 2**32 or 2**64 tries hash('') is 0 although randomizing is
active.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Steven D'Aprano

Benjamin Peterson wrote:

2012/1/5 Steven D'Aprano st...@pearwood.info:

[...]

There's nothing obscure about directly testing the hash. That's about as far
from obscure as it is possible to get: you are directly testing the presence
of a feature by testing the feature.


It's obscure because hash('') != 0 doesn't necessarily mean the hashes
are randomized. A different hashing algorithm could be in effect.


Fair point, but I didn't actually suggest testing hash('') != 0, that was 
Nick's suggestion, which he's since withdrawn.


--
Steven


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Glenn Linderman

On 1/5/2012 5:52 PM, Steven D'Aprano wrote:


At some point, presuming that there is no speed penalty, the behaviour 
will surely become not just enabled by default but mandatory. Python 
has never promised that hashes must be predictable or consistent, so 
apart from backwards compatibility concerns for old versions, future 
versions of Python should make it mandatory. Presuming that there is 
no speed penalty, I'd argue in favour of making it mandatory for 3.3. 
Why do we need a flag for something that is going to be always on? 


I think the whole paragraph is invalid, because it presumes there is no 
speed penalty.  I presume there will be a speed penalty, until 
benchmarking shows otherwise.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Serhiy Storchaka

06.01.12 02:10, Nick Coghlan написав(ла):

Not a good idea - a lot of the 3rd party tests that depend on dict
ordering are going to be using those modules anyway, so scattering our
solution across half the standard library is needlessly creating
additional work without really reducing the incompatibility problem.
If we're going to change anything, it may as well be the string
hashing algorithm itself.


Changing the string hashing algorithm will hit the general performance 
and also will break down any code that depend on dict ordering. 
Specialized dict slow down only needed parts of some applications.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-04 Thread Maciej Fijalkowski
On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen jans...@parc.com wrote:
 Christian Heimes li...@cheimes.de wrote:

 Am 29.12.2011 12:13, schrieb Mark Shannon:
  The attack relies on being able to predict the hash value for a given
  string. Randomising the string hash function is quite straightforward.
  There is no need to change the dictionary code.
 
  A possible (*untested*) patch is attached. I'll leave it for those more
  familiar with unicodeobject.c to do properly.

 I'm worried that hash randomization of str is going to break 3rd party
 software that rely on a stable hash across multiple Python instances.
 Persistence layers like ZODB and cross interpreter communication
 channels used by multiprocessing may (!) rely on the fact that the hash
 of a string is fixed.

 Software that depends on an undefined hash function for synchronization
 and persistence deserves to break, IMO.  There are plenty of
 well-defined hash functions available for this purpose.

 Bill
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

A lot of software will break their tests, because dict ordering would
depend on the particular run. I know, because some of them break on
pypy which has a different dict ordering. This is probably a good
thing in general, but is it really worth it? People will install
python 2.6.newest and stuff *will* break.

Is it *really* a security issue? We knew all along that dicts are
O(n^2) in worst case scenario, how is this suddenly a security
problem?

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-04 Thread Antoine Pitrou
On Wed, 4 Jan 2012 09:59:15 +0200
Maciej Fijalkowski fij...@gmail.com wrote:
 
 Is it *really* a security issue? We knew all along that dicts are
 O(n^2) in worst case scenario, how is this suddenly a security
 problem?

Because it has been shown to be exploitable for malicious purposes?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-04 Thread Christian Heimes
Am 04.01.2012 08:59, schrieb Maciej Fijalkowski:
 Is it *really* a security issue? We knew all along that dicts are
 O(n^2) in worst case scenario, how is this suddenly a security
 problem?

For example Microsoft has released an extraordinary and unscheduled
security patch for the issue between Christmas and New Year. I don't
normally use MS as reference but this should give you a hint about the
severity.

Have you watched the talk yet? http://www.youtube.com/watch?v=R2Cq3CLI6H8

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-04 Thread Eric Snow
On Wed, Jan 4, 2012 at 12:59 AM, Maciej Fijalkowski fij...@gmail.com wrote:
 On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen jans...@parc.com wrote:
 Christian Heimes li...@cheimes.de wrote:

 Am 29.12.2011 12:13, schrieb Mark Shannon:
  The attack relies on being able to predict the hash value for a given
  string. Randomising the string hash function is quite straightforward.
  There is no need to change the dictionary code.
 
  A possible (*untested*) patch is attached. I'll leave it for those more
  familiar with unicodeobject.c to do properly.

 I'm worried that hash randomization of str is going to break 3rd party
 software that rely on a stable hash across multiple Python instances.
 Persistence layers like ZODB and cross interpreter communication
 channels used by multiprocessing may (!) rely on the fact that the hash
 of a string is fixed.

 Software that depends on an undefined hash function for synchronization
 and persistence deserves to break, IMO.  There are plenty of
 well-defined hash functions available for this purpose.

 Bill
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

 A lot of software will break their tests, because dict ordering would
 depend on the particular run. I know, because some of them break on
 pypy which has a different dict ordering. This is probably a good
 thing in general, but is it really worth it? People will install
 python 2.6.newest and stuff *will* break.

So if we're making the new hashing the default and giving an option to
use the old, we should make it _really_ clear in the release
notes/announcement about how to revert the behavior.

-eric


 Is it *really* a security issue? We knew all along that dicts are
 O(n^2) in worst case scenario, how is this suddenly a security
 problem?

 Cheers,
 fijal
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-04 Thread Andrew Bennetts
On Wed, Jan 04, 2012 at 11:55:13AM +0100, Antoine Pitrou wrote:
 On Wed, 4 Jan 2012 09:59:15 +0200
 Maciej Fijalkowski fij...@gmail.com wrote:
  
  Is it *really* a security issue? We knew all along that dicts are
  O(n^2) in worst case scenario, how is this suddenly a security
  problem?
 
 Because it has been shown to be exploitable for malicious purposes?

I don't think that's news either.
http://mail.python.org/pipermail/python-dev/2003-May/035907.html and
http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for
instance show that in 2003 it was clearly known to at least be likely to be an
exploitable DoS in common code (a dict of HTTP headers or HTTP form keys).

There was debate about whether it's the language's responsibility to mitigate
the problem or if apps should use safer designs for handling untrusted input
(e.g. limit the number of keys input is allowed to create, or use something
other than dicts), and debate about just how practical an effective exploit
would be.  But I think it was understood to be a real concern 8 years ago, so
not exactly sudden.

Just because it's old news doesn't make it not a security problem, of course.

-Andrew.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-03 Thread Barry Warsaw
On Dec 31, 2011, at 04:56 PM, Guido van Rossum wrote:

Is there a tracker issue yet? The discussion should probably move there.

I think the answer to this was no... until now.

http://bugs.python.org/issue13703

Proposed patches should be linked to this issue now.  Please nosy yourself if
you want to follow the progress.

Cheers,
-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-03 Thread Bill Janssen
Christian Heimes li...@cheimes.de wrote:

 Am 29.12.2011 12:13, schrieb Mark Shannon:
  The attack relies on being able to predict the hash value for a given
  string. Randomising the string hash function is quite straightforward.
  There is no need to change the dictionary code.
  
  A possible (*untested*) patch is attached. I'll leave it for those more 
  familiar with unicodeobject.c to do properly.
 
 I'm worried that hash randomization of str is going to break 3rd party
 software that rely on a stable hash across multiple Python instances.
 Persistence layers like ZODB and cross interpreter communication
 channels used by multiprocessing may (!) rely on the fact that the hash
 of a string is fixed.

Software that depends on an undefined hash function for synchronization
and persistence deserves to break, IMO.  There are plenty of
well-defined hash functions available for this purpose.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-03 Thread Terry Reedy

On 1/3/2012 5:02 PM, Bill Janssen wrote:


Software that depends on an undefined hash function for synchronization
and persistence deserves to break, IMO.  There are plenty of
well-defined hash functions available for this purpose.


The doc for id() now says This is an integer which is guaranteed to be 
unique and constant for this object during its lifetime. Since the 
default 3.2.2 hash for my win7 64bit CPython is id-address // 16, it can 
have no longer guarantee. I suggest that hash() doc say something 
similar: http://bugs.python.org/issue13707


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Terry Reedy

On 1/2/2012 12:55 AM, Paul McMillan wrote:


Terry Reedy said:
I understood Alexander Klink and Julian Wälde, hash...@alech.de, as saying
that they consider that using a random non-zero start value is sufficient to
make the hash non-vulnerable.


I've been talking to them. They're happy to look at our proposed
changes. They indicate that a non-zero start value is sufficient to
prevent the attack, but D. J. Bernstein disagrees with them. He also
has indicated a willingness to look at our solution.


Great. My main concern currently is that there should be no noticeable 
slowdown for 64 bit builds which are apparently not vulnerable and which 
therefore would get no benefit.


Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Antoine Pitrou
On Sun, 1 Jan 2012 21:55:52 -0800
Paul McMillan p...@mcmillan.ws wrote:
 
 This is similar to the change proposed by Christian Heimes.
 
 Most importantly, I moved the xor with r[x % len_r] down a line.
 Before, it wasn't being applied to the last character.

Shouldn't it be r[i % len(r)] instead?
(refer to yesterday's #python-dev discussion)

 I think Ruby uses FNV-1 with a salt, making it less vulnerable to
 this. FNV is otherwise similar to our existing hash function.

Again, we could re-use FNV-1's primes, since they claim they have
better dispersion properties than the average prime.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Stephen J. Turnbull
Christian Heimes writes:
  Am 31.12.2011 13:03, schrieb Stephen J. Turnbull:
   I don't know the implementation issues well enough to claim it is a
   solution, but this hasn't been mentioned before AFAICS:
   
   While the dictionary probe has to start with a hash for backward
   compatibility reasons, is there a reason the overflow strategy for
   insertion has to be buckets containing lists?  How about
   double-hashing, etc?
  
  Python's dict implementation doesn't use bucket but open addressing (aka
  closed hashed table). The algorithm for conflict resolution doesn't use
  double hashing. Instead it takes the original and (in most cases) cached
  hash and perturbs the hash with a series of add, multiply and bit shift ops.

In an attack, this is still O(collisions) per probe (as any scheme
where the address of the nth collision is a function of only the
hash), where double hashing should be roughly O(1) (with double the
coefficient).

But that evidently imposes too large a performance burden on non-evil
users, so it's not worth thinking about whether roughly O(1) is
close enough to O(1) to deter or exhaust attackers.  I withdraw the
suggestion.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Christian Heimes
Am 02.01.2012 06:55, schrieb Paul McMillan:
 I think Ruby uses FNV-1 with a salt, making it less vulnerable to
 this. FNV is otherwise similar to our existing hash function.
 
 For the record, cryptographically strong hash functions are in the
 neighborhood of 400% slower than our existing hash function.

I've pushed a new patch
http://hg.python.org/features/randomhash/rev/0a65d2462e0c

The changeset adds the murmur3 hash algorithm with some minor changes,
for example more random seeds. At first I was worried that murmur might
be slower than our old hash algorithm. But in fact it seems to be faster!

Pybench 10 rounds on my Core2 Duo 2.60:

  py3k: 3.230 sec
  randomahash: 3.182 sec

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Anders J. Munch
 On 1/1/2012 12:28 PM, Christian Heimes wrote:
 I understood Alexander Klink and Julian Wälde, hash...@alech.de, as
 saying that they consider that using a random non-zero start value is
 sufficient to make the hash non-vulnerable.

Sufficient against their current attack.  But will it last?  For a
long-running server, there must be plenty of ways information can leak
that will help guessing that start value.

The alternative, to provide a dict-like datastructure for use with
untrusted input, deserves consideration.  Perhaps something simpler
than a balanced tree would do?  How about a dict-like class that is
built on a lazily sorted list?  Insertions basically just do
list.append and set a dirty-flag, and lookups use bisect - sorting
first if the dirty-flag is set.  It wouldn't be complete dict
replacement by any means, mixing insertions and lookups would have
terrible performance, but for something like POST parameters it should
be good enough.

I half expected to find something like that on activestate recipes
already, but couldn't find any.

regards, Anders
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Christian Heimes
Am 01.01.2012 19:45, schrieb Terry Reedy:
 On 1/1/2012 10:13 AM, Guido van Rossum wrote:
 PS. Is the collision-generator used in the attack code open source?
 
 As I posted before, Alexander Klink and Julian Wälde gave their project 
 email as hash...@alech.de. Since they indicated disappointment in not 
 hearing from Python, I presume they would welcome engagement.

Somebody should contact Alexander and Julian to let them know, that we
are working on the matter. It should be somebody official for the
initial contact, too. I've included Guido (BDFL), Barry (their initial
security contact) and MvL (most prominent German core dev) in CC, as
they are the logical choice for me.

I'm willing to have a phone call with them once the contact has been
established. IMHO it's slightly easier to talk in native tongue --
Alexander and Julian are German, too.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Georg Brandl
On 01/02/2012 04:47 PM, Christian Heimes wrote:
 Am 01.01.2012 19:45, schrieb Terry Reedy:
 On 1/1/2012 10:13 AM, Guido van Rossum wrote:
 PS. Is the collision-generator used in the attack code open source?
 
 As I posted before, Alexander Klink and Julian Wälde gave their project 
 email as hash...@alech.de. Since they indicated disappointment in not 
 hearing from Python, I presume they would welcome engagement.
 
 Somebody should contact Alexander and Julian to let them know, that we
 are working on the matter. It should be somebody official for the
 initial contact, too. I've included Guido (BDFL), Barry (their initial
 security contact) and MvL (most prominent German core dev) in CC, as
 they are the logical choice for me.
 
 I'm willing to have a phone call with them once the contact has been
 established. IMHO it's slightly easier to talk in native tongue --
 Alexander and Julian are German, too.

I wouldn't expect too much -- they seem rather keen on cheap laughs:

http://twitter.com/#!/bk3n/status/152068096448921600/photo/1/large

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Guido van Rossum
On Mon, Jan 2, 2012 at 7:47 AM, Christian Heimes li...@cheimes.de wrote:

 Am 01.01.2012 19:45, schrieb Terry Reedy:
  On 1/1/2012 10:13 AM, Guido van Rossum wrote:
  PS. Is the collision-generator used in the attack code open source?
 
  As I posted before, Alexander Klink and Julian Wälde gave their project
  email as hash...@alech.de. Since they indicated disappointment in not
  hearing from Python, I presume they would welcome engagement.

 Somebody should contact Alexander and Julian to let them know, that we
 are working on the matter. It should be somebody official for the
 initial contact, too. I've included Guido (BDFL), Barry (their initial
 security contact) and MvL (most prominent German core dev) in CC, as
 they are the logical choice for me.

 I'm willing to have a phone call with them once the contact has been
 established. IMHO it's slightly easier to talk in native tongue --
 Alexander and Julian are German, too.


I'm not sure I see the point -- just give them a link to the python-dev
archives.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Alexey Borzenkov
On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes li...@cheimes.de wrote:
 Am 02.01.2012 06:55, schrieb Paul McMillan:
 I think Ruby uses FNV-1 with a salt, making it less vulnerable to
 this. FNV is otherwise similar to our existing hash function.

 For the record, cryptographically strong hash functions are in the
 neighborhood of 400% slower than our existing hash function.

 I've pushed a new patch
 http://hg.python.org/features/randomhash/rev/0a65d2462e0c

It seems for 32-bit version you are using pid for the two constants.
Also, it's unclear why you even need to use a random constant for the
final pass, you already use random constant as an initial h1, and it
should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3,
k4 should be initialized to zero, these are key data, they don't need
to be mixed with anything.

Also, I'm not sure how portable is the always_inline attribute, is it
supported on all compilers and all platforms?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Alexey Borzenkov
On Mon, Jan 2, 2012 at 10:53 PM, Alexey Borzenkov sna...@gmail.com wrote:
 On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes li...@cheimes.de wrote:
 Am 02.01.2012 06:55, schrieb Paul McMillan:
 I think Ruby uses FNV-1 with a salt, making it less vulnerable to
 this. FNV is otherwise similar to our existing hash function.

 For the record, cryptographically strong hash functions are in the
 neighborhood of 400% slower than our existing hash function.

 I've pushed a new patch
 http://hg.python.org/features/randomhash/rev/0a65d2462e0c

 It seems for 32-bit version you are using pid for the two constants.
 Also, it's unclear why you even need to use a random constant for the
 final pass, you already use random constant as an initial h1, and it
 should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3,
 k4 should be initialized to zero, these are key data, they don't need
 to be mixed with anything.

Sorry, sent too soon. What I mean is that you're initializing a pretty
big array of values when you only need a 32-bit value. Pid, in my
opinion might be too predictable, it would be a lot better to simply
hash pid and gettimeofday bytes to produce this single 32-bit value
and use it for h1, h2, h3 and h4 in both 32-bit and 128-bit versions.

 Also, I'm not sure how portable is the always_inline attribute, is it
 supported on all compilers and all platforms?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-02 Thread Barry Warsaw
On Jan 02, 2012, at 06:38 PM, Georg Brandl wrote:

I wouldn't expect too much -- they seem rather keen on cheap laughs:

http://twitter.com/#!/bk3n/status/152068096448921600/photo/1/large

Heh, so yeah, it won't be me contacting them.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Guido van Rossum
On Sat, Dec 31, 2011 at 10:57 PM, Paul McMillan p...@mcmillan.ws wrote:

  Still, it would represent an effort for the attacker of a much greater
  magnitude than the current attack. It's all a trade-off -- at some point
  it'll just be easier for the attacker to use some other vulnerability.
 Also
  the class of vulnerable servers would be greatly reduced.

 I agree that doing anything is better than doing nothing. If we use
 the earlier suggestion and prepend everything with a fixed-length
 seed, we need quite a bit of entropy (and so a fairly long string) to
 make a lasting difference.


Ah, but the effect of that long string is summarized in a single (32- or
64-bit) integer.


   I'm not sure I understand this. What's the worry about a different
 class of
  hash functions? (It may be clear that I do not have a deep mathematical
  understanding of hash functions.)

 This was mostly in reference to earlier suggestions of switching to
 cityhash, or using btrees, or other more invasive changes. Python 2.X
 is pretty stable and making large changes like that to the codebase
 can have unpredictable effects.


Agreed.


 We know that the current hash function
 works well (except for this specific problem), so it seems like the
 best fix will be as minimal a modification as possible, to avoid
 introducing bugs.


Yup.


  I forget -- what do we do on systems without urandom()? (E.g. Windows?)
 Windows has CryptGenRandom which is approximately equivalent.

  Let's worry about timing attacks another time okay?
 Agreed. As long as there isn't a gaping hole, I'm fine with that.

  Hm. I'm not sure I like the idea of extra arithmetic for every character
  being hashed.

 From a performance standpoint, this may still be better than adding 8
 or 10 characters to every single hash operation, since most hashes are
 over short strings.


But how about precomputing the intermediate value (x)? The hash is (mostly)
doing x = f(x, c) for each c in the input.

It is important that this function touches every
 character - if it only interacts with a subset of them, an attacker
 can fix that subset and vary the rest.


I sort of see your point, but I still think that if we could add as little
per-character overhead as possible it would be best -- sometimes people
*do* hash very long strings.


  But I like the idea of a bigger random seed from which we
  deterministically pick some part.

 Yeah. This makes it much harder to attack, since it very solidly
 places the attacker outside the realm of just brute force the key.

  How about just initializing x to some
  subsequence of the seed determined by e.g. the length of the hashed
 string
  plus a few characters from it?

 We did consider this, and if performance is absolutely the prime
 directive, this (or a variant) may be the best option. Unfortunately,
 the collision generator doesn't necessarily vary the length of the
 string. Additionally, if we don't vary based on all the letters in the
 string, an attacker can fix the characters that we do use and generate
 colliding strings around them.


Still, much more work for the attacker.


 Another option to consider would be to apply this change to some but
 not all of the rounds. If we added the seed lookup xor operation for
 only the first and last 5 values of x, we would still retain much of
 the benefit without adding much computational overhead for very long
 strings.


I like that.


 We could also consider a less computationally expensive operation than
 the modulo for calculating the lookup index, like simply truncating to
 the correct number of bits.


Sure. Thanks for thinking about all the details here!!

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Guido van Rossum
Different concern. What if someone were to have code implementing an
external, persistent hash table, using Python's hash() function? They might
have a way to rehash everything when a new version of Python comes along,
but they would not be happy if hash() is different in each process. I
somehow vaguely remember possibly having seen such code, or something else
where a bit of random data was needed and hash() was used since it's so
easily available.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Guido van Rossum
PS. Is the collision-generator used in the attack code open source?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Christian Heimes
Am 01.01.2012 16:13, schrieb Guido van Rossum:
 Different concern. What if someone were to have code implementing an
 external, persistent hash table, using Python's hash() function? They
 might have a way to rehash everything when a new version of Python comes
 along, but they would not be happy if hash() is different in each
 process. I somehow vaguely remember possibly having seen such code, or
 something else where a bit of random data was needed and hash() was used
 since it's so easily available.

I had the same concern as you and was worried that projects like ZODB
might require a stable hash function. Fred already stated that ZODB
doesn't use the hash in its btree structures.

Possible solutions:

 * make it possible to provide the seed as an env var

 * disable randomizing as default setting or at least add an option to
disable randomization

IMHO the issue needs a PEP that explains the issue, shows all possible
solutions and describes how we have solved the issue. I'm willing to
start a PEP. Who likes to be the co-author?

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Christian Heimes
Am 31.12.2011 23:38, schrieb Terry Reedy:
 On 12/31/2011 4:43 PM, PJ Eby wrote:
 
 Here's an idea.  Suppose we add a sys.hash_seed or some such, that's
 settable to an int, and defaults to whatever we're using now.  Then
 programs that want a fix can just set it to a random number,
 
 I do not think we can allow that to change once there are hashed 
 dictionaries existing.

Me, too. Armin suggested to use an env var as random.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Christian Heimes
Am 01.01.2012 00:56, schrieb Guido van Rossum:
 ISTM the only reasonable thing is to have a random seed picked very
 early in the process, to be used to change the hash() function of
 str/bytes/unicode (in a way that they are still compatible with each other).
 
 The seed should be unique per process except it should survive fork()
 (but not exec()). I'm not worried about unrelated processes needing to
 have the same hash(), but I'm not against offering an env variable or
 command line flag to force the seed.

I've created a clone at http://hg.python.org/features/randomhash/ as a
testbed. The code creates the seed very early in PyInitializeEx(). The
method isn't called on fork() but on exec().

 I'm not too concerned about a 3rd party being able to guess the random
 seed -- this would require much more effort on their part, since they
 would have to generate a new set of colliding keys each time they think
 they have guessed the hash (as long as they can't force the seed -- this
 actually argues slightly *against* offering a way to force the seed,
 except that we have strong backwards compatibility requirements).

The talkers claim and have shown that it's too easy to pre-calculate
collisions with hashing algorithms similar to DJBX33X / DJBX33A. It
might be a good idea to change the hashing algorithm, too. Paul as
listed some new algorithms. Ruby 1.9 is using FNV
http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a
good dispersion pattern. A hashing algorithm without a
meet-in-the-middle vulnerability would reduce the pressure on a good and
secure seed, too.

 We need to fix this as far back as Python 2.6, and it would be nice if a
 source patch was available that works on Python 2.5 -- personally I do
 have a need for a 2.5 fix and if nobody creates one I will probably end
 up backporting the fix from 2.6 to 2.5.

+1

Should the randomization be disabled on 2.5 to 3.2 by default to reduce
backward compatibility issues?

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Christian Heimes
Am 01.01.2012 05:11, schrieb Antoine Pitrou:
 On Sat, 31 Dec 2011 16:56:00 -0700
 Guido van Rossum gu...@python.org wrote:
 ISTM the only reasonable thing is to have a random seed picked very early
 in the process, to be used to change the hash() function of
 str/bytes/unicode (in a way that they are still compatible with each other).
 
 Do str and bytes still have to be compatible with each other in 3.x?

py3k has tests for hash(ascii) == hash(bascii). Are you talking
about this invariant?

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Antoine Pitrou
On Sun, 01 Jan 2012 16:48:32 +0100
Christian Heimes li...@cheimes.de wrote:
 The talkers claim and have shown that it's too easy to pre-calculate
 collisions with hashing algorithms similar to DJBX33X / DJBX33A. It
 might be a good idea to change the hashing algorithm, too. Paul as
 listed some new algorithms. Ruby 1.9 is using FNV
 http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a
 good dispersion pattern.

We already seem to be using a FNV-alike, is it just a matter of
changing the parameters?

 A hashing algorithm without a
 meet-in-the-middle vulnerability would reduce the pressure on a good and
 secure seed, too.
 
  We need to fix this as far back as Python 2.6, and it would be nice if a
  source patch was available that works on Python 2.5 -- personally I do
  have a need for a 2.5 fix and if nobody creates one I will probably end
  up backporting the fix from 2.6 to 2.5.
 
 +1
 
 Should the randomization be disabled on 2.5 to 3.2 by default to reduce
 backward compatibility issues?

Isn't 2.5 already EOL'ed?
As for 3.2, I'd say no. I don't know about 2.6 and 2.7.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Antoine Pitrou
On Sun, 01 Jan 2012 16:56:19 +0100
Christian Heimes li...@cheimes.de wrote:
 Am 01.01.2012 05:11, schrieb Antoine Pitrou:
  On Sat, 31 Dec 2011 16:56:00 -0700
  Guido van Rossum gu...@python.org wrote:
  ISTM the only reasonable thing is to have a random seed picked very early
  in the process, to be used to change the hash() function of
  str/bytes/unicode (in a way that they are still compatible with each 
  other).
  
  Do str and bytes still have to be compatible with each other in 3.x?
 
 py3k has tests for hash(ascii) == hash(bascii). Are you talking
 about this invariant?

Yes. It doesn't seem to have any point anymore.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Christian Heimes
Am 01.01.2012 06:57, schrieb Paul McMillan:
 I agree that doing anything is better than doing nothing. If we use
 the earlier suggestion and prepend everything with a fixed-length
 seed, we need quite a bit of entropy (and so a fairly long string) to
 make a lasting difference.

Your code at https://gist.github.com/0a91e52efa74f61858b5 reads about 2
MB (2**21 - 1) data from urandom. I'm worried that this is going to
exhaust the OS's random pool and suck it dry. We shouldn't forget that
Python is used for long running processes as well as short scripts. Your
suggestion also increases the process size by 2 MB which is going to be
an issue for mobile and embedded platforms.

How about this:

r = [ord(i) for i in os.urandom(256)]
rs = os.urandom(4) # or 8 ?
seed = rs[-1] + (rs[-2]  8) + (rs[-3]  16) + (rs[-4]  24)

def _hash_string(s):
The algorithm behind compute_hash() for a string or a unicode.
from pypy.rlib.rarithmetic import intmask
length = len(s)
if length == 0:
return -1
x = intmask(seed + (ord(s[0])  7))
i = 0
while i  length:
o = ord(s[i])
x = intmask((103*x) ^ o ^ r[o % 0xff]
i += 1
x ^= length
return intmask(x)

This combines a random seed for the hash with your suggestion.

We also need to special case short strings. The above routine hands over
the seed to attackers, if he is able to retrieve lots of single
character hashes. The randomization shouldn't be used if we can prove
that it's not possible to create hash collisions for strings shorter
than X. For example 64bit FNV-1 has no collisions for 8 chars or less,
32bit FNV has no collisions for 4 or less cars.

Christian

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Christian Heimes
Am 01.01.2012 17:09, schrieb Antoine Pitrou:
 On Sun, 01 Jan 2012 16:48:32 +0100
 Christian Heimes li...@cheimes.de wrote:
 The talkers claim and have shown that it's too easy to pre-calculate
 collisions with hashing algorithms similar to DJBX33X / DJBX33A. It
 might be a good idea to change the hashing algorithm, too. Paul as
 listed some new algorithms. Ruby 1.9 is using FNV
 http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a
 good dispersion pattern.
 
 We already seem to be using a FNV-alike, is it just a matter of
 changing the parameters?

No, we are using something similar to DJBX33X. FNV is a completely
different type of hash algorithm.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Antoine Pitrou
Le dimanche 01 janvier 2012 à 17:34 +0100, Christian Heimes a écrit :
 Am 01.01.2012 17:09, schrieb Antoine Pitrou:
  On Sun, 01 Jan 2012 16:48:32 +0100
  Christian Heimes li...@cheimes.de wrote:
  The talkers claim and have shown that it's too easy to pre-calculate
  collisions with hashing algorithms similar to DJBX33X / DJBX33A. It
  might be a good idea to change the hashing algorithm, too. Paul as
  listed some new algorithms. Ruby 1.9 is using FNV
  http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a
  good dispersion pattern.
  
  We already seem to be using a FNV-alike, is it just a matter of
  changing the parameters?
 
 No, we are using something similar to DJBX33X. FNV is a completely
 different type of hash algorithm.

I don't understand. FNV-1 multiplies the current running result with a
prime and then xors it with the following byte. This is also what we do.
(I'm assuming 103 is prime)

I see two differences:
- FNV starts with a non-zero constant offset basis
- FNV uses a different prime than ours

(as a side note, FNV operates on bytes, but for unicode we must operate
on code points in [0, 1114111]: although arguably the common case is
hashing ASCII substrings (protocol tokens etc.))

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Christian Heimes
Am 01.01.2012 17:54, schrieb Antoine Pitrou:
 I don't understand. FNV-1 multiplies the current running result with a
 prime and then xors it with the following byte. This is also what we do.
 (I'm assuming 103 is prime)

There must be a major difference somewhere inside the algorithm. The
talk at the CCC conference in Berlin mentions that Ruby 1.9 is not
vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C
code of FNV is more complex than our code, too.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Christian Heimes
Am 01.01.2012 01:11, schrieb Guido van Rossum:
 FWIW I managed to build Python 2.6, and a trivial mutation of the
 string/unicode hash function (add 1 initially) made only three tests
 fail; test_symtable and test_json both have a dependency on dictionary
 order, test_ctypes I can't quite figure out what's going on.

In my fork, these tests are failing:

test_dbm test_dis test_gdb test_inspect test_packaging test_set
test_symtable test_urllib test_userdict test_collections
test_json

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Victor Stinner

Le 01/01/2012 04:29, Paul McMillan a écrit :

This is incorrect. Once an attacker has guessed the random seed, any
operation which reveals the ordering of hashed objects can be used to
verify the answer. JSON responses would be ideal. In fact, an attacker
can do a brute-force attack of the random seed offline. Once they have
the seed, generating collisions is a fast process.


If we want to protect a website against this attack for example, we must 
suppose that the attacker can inject arbitrary data and can get 
(indirectly) the result of hash(str) (e.g. with the representation of a 
dict in a traceback, with a JSON output, etc.).



The goal isn't perfection, but we need to do better than a simple
salt.


I disagree. I don't want to break backward compatibility and have a 
hash() function different for each process, if the change is not an 
effective protection against the hash vulnerability.


It's really hard to write a good (secure) hash function: see for example 
the recent NIST competition (started in 2008, will end this year). Even 
good security researcher are unable to write a strong and fast hash 
function. It's easy to add a weakness in the function if you don't have 
a good background in cryptography. The NIST competition gives 4 years to 
analyze new hash functions. We should not rush to add a quick hack if 
it doesn't solve correctly the problem (protect against a collision 
attack and preimage attack).


http://en.wikipedia.org/wiki/NIST_hash_function_competition
http://en.wikipedia.org/wiki/Collision_attack

Runtime performance does matter, I'm not completly sure that changing 
Python is the best place to add a countermeasure against a 
vulnerability. I don't want to slow down numpy for a web vulnerability. 
Because there are different use cases, a better compromise is maybe to 
add a runtime option to use a secure hash function, and keep the unsafe 
but fast hash function by default.



I propose we modify the string hash function like this:

https://gist.github.com/0a91e52efa74f61858b5


Always allocate 2**21 bytes just to workaround one specific kind of 
attack is not acceptable. I suppose that the maximum acceptable is 4096 
bytes (or better 256 bytes).


Crytographic hash functions don't need random data, why would Python 
need 2 MB (!) for its hash function?


Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Terry Reedy

On 1/1/2012 10:13 AM, Guido van Rossum wrote:

PS. Is the collision-generator used in the attack code open source?


As I posted before, Alexander Klink and Julian Wälde gave their project 
email as hash...@alech.de. Since they indicated disappointment in not 
hearing from Python, I presume they would welcome engagement.


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Terry Reedy

On 1/1/2012 12:28 PM, Christian Heimes wrote:

Am 01.01.2012 17:54, schrieb Antoine Pitrou:

I don't understand. FNV-1 multiplies the current running result with a
prime and then xors it with the following byte. This is also what we do.
(I'm assuming 103 is prime)


There must be a major difference somewhere inside the algorithm. The
talk at the CCC conference in Berlin mentions that Ruby 1.9 is not
vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C
code of FNV is more complex than our code, too.


I understood Alexander Klink and Julian Wälde, hash...@alech.de, as 
saying that they consider that using a random non-zero start value is 
sufficient to make the hash non-vulnerable.


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Jim Jewett
Steven D'Aprano (in
http://mail.python.org/pipermail/python-dev/2011-December/115162.html)
wrote:

 By compile-time, do you mean when the byte-code is compilated, i.e. just
 before runtime, rather than a switch when compiling the Python executable from
 source?

No.  I really mean when the C code is initially compiled to produce an
python executable.

The only reason we're worrying about this is that an adversary may
force worst-case performance.  If the python instance isn't a server,
or at least isn't exposed to untrusted clients, then even a single
extra if test is unjustified overhead.  Adding overhead to every
string hash or every dict lookup is bad.

That said, adding some overhead (only) to dict lookups *that already
hit half a dozen consecutive collisions* probably is reasonable,
because that won't happen very often with normal data.  (6 collisions
can't happen at all unless there are already at least 6 entries, so
small dicts are safe; with at least 1/3 of the slots empty, it should
happen only 1/729 for worst-size larger dicts.)

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Paul McMillan
 But how about precomputing the intermediate value (x)? The hash is (mostly)
 doing x = f(x, c) for each c in the input.

That's a fair point. If we go down that avenue, I think simply
choosing a random fixed starting value for x is the correct choice,
rather than computing an intermediate value.

 I sort of see your point, but I still think that if we could add as little
 per-character overhead as possible it would be best -- sometimes people *do*
 hash very long strings.

Yep, agreed. My original proposal did not adequately address this.

 Another option to consider would be to apply this change to some but
 not all of the rounds. If we added the seed lookup xor operation for
 only the first and last 5 values of x, we would still retain much of
 the benefit without adding much computational overhead for very long
 strings.

 I like that.

I believe this is a reasonable solution. An attacker could still
manipulate the internal state of long strings, but the additional
information at both ends should make that difficult to exploit. I'll
talk it over with the reviewers.

 We could also consider a less computationally expensive operation than
 the modulo for calculating the lookup index, like simply truncating to
 the correct number of bits.

 Sure. Thanks for thinking about all the details here!!

Again, I'll talk to the reviewers (and run the randomness test
battery) to be double-check that this doesn't affect the distribution
in some unexpected way, but I think it should be fine.

 PS. Is the collision-generator used in the attack code open source?

Not in working form, and they've turned down requests for it from
other projects that want to check their work. If it's imperative that
we have one, I can write one, but I'd rather not spend the effort if
we don't need it.

-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Paul McMillan
 Different concern. What if someone were to have code implementing an
 external, persistent hash table, using Python's hash() function? They might
 have a way to rehash everything when a new version of Python comes along,
 but they would not be happy if hash() is different in each process. I
 somehow vaguely remember possibly having seen such code, or something else
 where a bit of random data was needed and hash() was used since it's so
 easily available.

I agree that there are use cases for allowing users to choose the
random seed, in much the same way it's helpful to be able to set it
for the random number generator. This should probably be something
that can be passed in at runtime. This feature would also be useful
for users who want to synchronize the hashes of multiple independent
processes, for whatever reason. For the general case though,
randomization should be on by default.

-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Jim Jewett
Paul McMillan in
http://mail.python.org/pipermail/python-dev/2012-January/115183.html
wrote:

 Guido van Rossum wrote:
 Hm. I'm not sure I like the idea of extra arithmetic for every character
 being hashed.

 the collision generator doesn't necessarily vary the length of the
 string. Additionally, if we don't vary based on all the letters in the
 string, an attacker can fix the characters that we do use and generate
 colliding strings around them.

If the new hash algorithm doesn't kick in before, say, 32 characters,
then most currently hashed strings will not be affected.  And if the
attacker has to add 32 characters to every key, it reduces the this
can be done with only N bytes uploaded risk.  (The same logic
would apply to even longer prefixes, except that an attacker might
more easily find short-enough strings that collide.)

 We could also consider a less computationally expensive operation
 than the modulo for calculating the lookup index, like simply truncating
 to the correct number of bits.

Given that the modulo is always 2^N, how is that different?

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Jim Jewett
Victor Stinner wrote in
http://mail.python.org/pipermail/python-dev/2012-January/115198.html

 If we want to protect a website against this attack for example, we must
 suppose that the attacker can inject arbitrary data and can get
 (indirectly) the result of hash(str) (e.g. with the representation of a
 dict in a traceback, with a JSON output, etc.).

(1)  Is it common to hash non-string input?  Because generating integers
that collide for certain dict sizes is pretty easy...

(2)  Would it make sense for traceback printing to sort dict keys?  (Any site
worried about this issue should already be hiding tracebacks from untrusted
clients, but the cost of this extra protection may be pretty small, given that
tracebacks shouldn't be printed all that often in the first place.)

(3)  Should the docs for json.encoder.JSONEncoder suggest sort_keys=True?

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-01 Thread Paul McMillan
I fixed a couple things in my proposed algorithm:
https://gist.github.com/0a91e52efa74f61858b5

I had a typo, and used 21 instead of 12 for the size multiplier. We
definitely don't need 2MB random data.

The initialization of r was broken. Now it is an array of ints, so
there's no conversion when it's used. I've adjusted it so there's 8k
of random data, broken into 2048 ints.

I added a length-based seed to the initial value of x. This prevents
single-characters from being used to enumerate raw values from r.
This is similar to the change proposed by Christian Heimes.

Most importantly, I moved the xor with r[x % len_r] down a line.
Before, it wasn't being applied to the last character.

 Christian Heimes said:
 We also need to special case short strings. The above routine hands over
 the seed to attackers, if he is able to retrieve lots of single
 character hashes.

The updated code always includes at least 2 lookups from r, which I
believe solves the single-character enumeration problem. If we
special-case part of our hash function for short strings, we may get
suboptimal collisions between the two types of hashes.

I think Ruby uses FNV-1 with a salt, making it less vulnerable to
this. FNV is otherwise similar to our existing hash function.

For the record, cryptographically strong hash functions are in the
neighborhood of 400% slower than our existing hash function.

 Terry Reedy said:
 I understood Alexander Klink and Julian Wälde, hash...@alech.de, as saying
 that they consider that using a random non-zero start value is sufficient to
 make the hash non-vulnerable.

I've been talking to them. They're happy to look at our proposed
changes. They indicate that a non-zero start value is sufficient to
prevent the attack, but D. J. Bernstein disagrees with them. He also
has indicated a willingness to look at our solution.

-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Stephen J. Turnbull
Victor Stinner writes:

  Let's try to summarize this vulnerability.
  
  The creation of a Python dictionary has a complexity of O(n) in most 
  cases, but O(n^2) in the *worst* case. The attack tries to go into the 
  worst case. It requires to compute a set of N keys having the same hash 
  value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to 
  compute these keys once. It looks like it is now cheap enough in 
  practice to compute this dataset for Python (and other languages).

I don't know the implementation issues well enough to claim it is a
solution, but this hasn't been mentioned before AFAICS:

While the dictionary probe has to start with a hash for backward
compatibility reasons, is there a reason the overflow strategy for
insertion has to be buckets containing lists?  How about
double-hashing, etc?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Christian Heimes
Am 31.12.2011 13:03, schrieb Stephen J. Turnbull:
 I don't know the implementation issues well enough to claim it is a
 solution, but this hasn't been mentioned before AFAICS:
 
 While the dictionary probe has to start with a hash for backward
 compatibility reasons, is there a reason the overflow strategy for
 insertion has to be buckets containing lists?  How about
 double-hashing, etc?

Python's dict implementation doesn't use bucket but open addressing (aka
closed hashed table). The algorithm for conflict resolution doesn't use
double hashing. Instead it takes the original and (in most cases) cached
hash and perturbs the hash with a series of add, multiply and bit shift ops.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread martin


Zitat von Victor Stinner victor.stin...@haypocalc.com:


The current implementation of dict uses a linked-list.

[...]

Tell me if I am wrong.


At least with this statement, you are wrong: the current
implementation does *not* use a linked-list. Instead, it
uses open addressing.

Regards,
Martin



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread PJ Eby
On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull step...@xemacs.orgwrote:

 While the dictionary probe has to start with a hash for backward
 compatibility reasons, is there a reason the overflow strategy for
 insertion has to be buckets containing lists?  How about
 double-hashing, etc?


This won't help, because the keys still have the same hash value. ANYTHING
you do to them after they're generated will result in them still colliding.

The *only* thing that works is to change the hash function in such a way
that the strings end up with different hashes in the first place.
 Otherwise, you'll still end up with (deliberate) collisions.

(Well, technically, you could use trees or some other O log n data
structure as a fallback once you have too many collisions, for some value
of too many.  Seems a bit wasteful for the purpose, though.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Jeffrey Yasskin
On Wed, Dec 28, 2011 at 5:37 PM, Jesse Noller jnol...@gmail.com wrote:


 On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote:

 Hello all,

 A paper (well, presentation) has been published highlighting security 
 problems with the hashing algorithm (exploiting collisions) in many 
 programming languages Python included:

 http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

 Although it's a security issue I'm posting it here because it is now public 
 and seems important.

 The issue they report can cause (for example) handling an http post to 
 consume horrible amounts of cpu. For Python the figures they quoted:

 reasonable-sized attack strings only for 32 bits Plone has max. POST size of 
 1 MB
 7 minutes of CPU usage for a 1 MB request
 ~20 kbits/s → keep one Core Duo core busy

 This was apparently reported to the security list, but hasn't been responded 
 to beyond an acknowledgement on November 24th (the original report didn't 
 make it onto the security list because it was held in a moderation queue).

 The same vulnerability was reported against various languages and web 
 frameworks, and is already fixed in some of them.

 Their recommended fix is to randomize the hash function.

 All the best,

 Michael

 Back up link for the PDF:
 http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

 Ocert disclosure:
 http://www.ocert.org/advisories/ocert-2011-003.html

Discussion of hash functions in general:
http://burtleburtle.net/bob/hash/doobs.html
Two of the best hash functions that currently exist:
http://code.google.com/p/cityhash/ and
http://code.google.com/p/smhasher/wiki/MurmurHash.

I'm not sure exactly what problem the paper is primarily complaining about:
1. Multiply+add and multiply+xor hashes are weak: this would be solved
by changing to either of the better-and-faster hashes I linked to
above. On the other hand:
http://mail.python.org/pipermail/python-3000/2007-September/010327.html
2. It's possible to find collisions in any hash function in a 32-bit
space: only solved by picking a varying seed at startup or compile
time.

If you decide to change to a stronger hash function overall, it might
also be useful to change the advice to somehow mix together (e.g.
using exclusive or) the hash values for the components in
http://docs.python.org/py3k/reference/datamodel.html#object.__hash__.
hash(tuple(components)) will likely be better if tuple's hash is
improved.

Hash functions are already unstable across Python versions. Making
them unstable across interpreter processes (multiprocessing doesn't
share dicts, right?) doesn't sound like a big additional problem.
Users who want a distributed hash table will need to pull their own
hash function out of hashlib or re-implement a non-cryptographic hash
instead of using the built-in one, but they probably need to do that
already to allow themselves to upgrade Python.

Jeffrey
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread PJ Eby
On Sat, Dec 31, 2011 at 4:04 PM, Jeffrey Yasskin jyass...@gmail.com wrote:

 Hash functions are already unstable across Python versions. Making
 them unstable across interpreter processes (multiprocessing doesn't
 share dicts, right?) doesn't sound like a big additional problem.
 Users who want a distributed hash table will need to pull their own
 hash function out of hashlib or re-implement a non-cryptographic hash
 instead of using the built-in one, but they probably need to do that
 already to allow themselves to upgrade Python.


Here's an idea.  Suppose we add a sys.hash_seed or some such, that's
settable to an int, and defaults to whatever we're using now.  Then
programs that want a fix can just set it to a random number, and on Python
versions that support it, it takes effect.  Everywhere else it's a silent
no-op.

Downside: sys has to have slots for this to work; does sys actually have
slots?  My memory's hazy on that.  I guess actually it'd have to be
sys.set_hash_seed().  But same basic idea.

Anyway, this would make fixing the problem *possible*, while still pushing
off the hard decisions to the app/framework developers.  ;-)

Downside: every hash operation includes one extra memory access, but
strings only compute their hash once anyway.)

Given that changing dict won't help, and changing the default hash is a
non-starter, an option to set the seed is probably the way to go.  (Maybe
with an environment variable and/or command line option so users can work
around old code.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Terry Reedy

On 12/31/2011 4:43 PM, PJ Eby wrote:


Here's an idea.  Suppose we add a sys.hash_seed or some such, that's
settable to an int, and defaults to whatever we're using now.  Then
programs that want a fix can just set it to a random number,


I do not think we can allow that to change once there are hashed 
dictionaries existing.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Guido van Rossum
ISTM the only reasonable thing is to have a random seed picked very early
in the process, to be used to change the hash() function of
str/bytes/unicode (in a way that they are still compatible with each other).

The seed should be unique per process except it should survive fork() (but
not exec()). I'm not worried about unrelated processes needing to have the
same hash(), but I'm not against offering an env variable or command line
flag to force the seed.

I'm not too concerned about a 3rd party being able to guess the random seed
-- this would require much more effort on their part, since they would have
to generate a new set of colliding keys each time they think they have
guessed the hash (as long as they can't force the seed -- this actually
argues slightly *against* offering a way to force the seed, except that we
have strong backwards compatibility requirements).

We need to fix this as far back as Python 2.6, and it would be nice if a
source patch was available that works on Python 2.5 -- personally I do have
a need for a 2.5 fix and if nobody creates one I will probably end up
backporting the fix from 2.6 to 2.5.

Is there a tracker issue yet? The discussion should probably move there.

PS. I would propose a specific fix but I can't seem to build a working
CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this
error late in the build:

./python.exe -SE -m sysconfig --generate-posix-vars
Fatal Python error: Py_Initialize: can't initialize sys standard streams
Traceback (most recent call last):
  File /Users/guido/cpython/Lib/io.py, line 60, in module
make: *** [Lib/_sysconfigdata.py] Abort trap

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Guido van Rossum
On Sat, Dec 31, 2011 at 4:56 PM, Guido van Rossum gu...@python.org wrote:

 PS. I would propose a specific fix but I can't seem to build a working
 CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this
 error late in the build:

 ./python.exe -SE -m sysconfig --generate-posix-vars
 Fatal Python error: Py_Initialize: can't initialize sys standard streams
 Traceback (most recent call last):
   File /Users/guido/cpython/Lib/io.py, line 60, in module
 make: *** [Lib/_sysconfigdata.py] Abort trap


FWIW I managed to build Python 2.6, and a trivial mutation of the
string/unicode hash function (add 1 initially) made only three tests fail;
test_symtable and test_json both have a dependency on dictionary order,
test_ctypes I can't quite figure out what's going on.

Oh, and an unrelated failure in test_sqlite:

  File /Users/guido/pythons/p26/Lib/sqlite3/test/types.py, line 355, in
CheckSqlTimestamp
self.failUnlessEqual(ts.year, now.year)
AssertionError: 2012 != 2011

I betcha that's because it's still 2011 here in Texas but already 2012 in
UTC-land. Happy New Year everyone! :-)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2011-12-31 Thread Paul McMillan
 I'm not too concerned about a 3rd party being able to guess the random seed
 -- this would require much more effort on their part, since they would have
 to generate a new set of colliding keys each time they think they have
 guessed the hash

This is incorrect. Once an attacker has guessed the random seed, any
operation which reveals the ordering of hashed objects can be used to
verify the answer. JSON responses would be ideal. In fact, an attacker
can do a brute-force attack of the random seed offline. Once they have
the seed, generating collisions is a fast process.

The goal isn't perfection, but we need to do better than a simple
salt. I propose we modify the string hash function like this:

https://gist.github.com/0a91e52efa74f61858b5

This code is based on PyPy's implementation, but the concept is
universal. Rather than choosing a single short random seed per
process, we generate a much larger random seed (r). As we hash, we
deterministically choose a portion of that seed and incorporate it
into the hash process. This modification is a minimally intrusive
change to the existing hash function, and so should not introduce
unexpected side effects which might come from switching to a different
class of hash functions.

I've worked through this code with Alex Gaynor, Antoine Pitrou, and
Victor Stinner, and have asked several mathematicians and security
experts to review the concept. The reviewers who have gotten back to
me thus far have agreed that if the initial random seed is not flawed,
this should not overly change the properties of the hash function, but
should make it quite difficult for an attacker to deduce the necessary
information to predictably cause hash collisions. This function is not
designed to protect against timing attacks, but should be nontrivial
to reverse even with access to timing data.

Empirical testing shows that this unoptimized python implementation
produces ~10% slowdown in the hashing of ~20 character strings. This
is probably an acceptable trade off, and actually provides better
performance in the case of short strings than a high-entropy
fixed-length seed prefix.

-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >