On Sat, Dec 31, 2011 at 10:57 PM, Paul McMillan <p...@mcmillan.ws> wrote:
> > Still, it would represent an effort for the attacker of a much greater > > magnitude than the current attack. It's all a trade-off -- at some point > > it'll just be easier for the attacker to use some other vulnerability. > Also > > the class of vulnerable servers would be greatly reduced. > > I agree that doing anything is better than doing nothing. If we use > the earlier suggestion and prepend everything with a fixed-length > seed, we need quite a bit of entropy (and so a fairly long string) to > make a lasting difference. > Ah, but the effect of that long string is summarized in a single (32- or 64-bit) integer. > > I'm not sure I understand this. What's the worry about "a different > class of > > hash functions"? (It may be clear that I do not have a deep mathematical > > understanding of hash functions.) > > This was mostly in reference to earlier suggestions of switching to > cityhash, or using btrees, or other more invasive changes. Python 2.X > is pretty stable and making large changes like that to the codebase > can have unpredictable effects. Agreed. > We know that the current hash function > works well (except for this specific problem), so it seems like the > best fix will be as minimal a modification as possible, to avoid > introducing bugs. > Yup. > > I forget -- what do we do on systems without urandom()? (E.g. Windows?) > Windows has CryptGenRandom which is approximately equivalent. > > > Let's worry about timing attacks another time okay? > Agreed. As long as there isn't a gaping hole, I'm fine with that. > > > Hm. I'm not sure I like the idea of extra arithmetic for every character > > being hashed. > > From a performance standpoint, this may still be better than adding 8 > or 10 characters to every single hash operation, since most hashes are > over short strings. But how about precomputing the intermediate value (x)? The hash is (mostly) doing x = f(x, c) for each c in the input. It is important that this function touches every > character - if it only interacts with a subset of them, an attacker > can fix that subset and vary the rest. > I sort of see your point, but I still think that if we could add as little per-character overhead as possible it would be best -- sometimes people *do* hash very long strings. > > But I like the idea of a bigger random seed from which we > > deterministically pick some part. > > Yeah. This makes it much harder to attack, since it very solidly > places the attacker outside the realm of "just brute force the key". > > > How about just initializing x to some > > subsequence of the seed determined by e.g. the length of the hashed > string > > plus a few characters from it? > > We did consider this, and if performance is absolutely the prime > directive, this (or a variant) may be the best option. Unfortunately, > the collision generator doesn't necessarily vary the length of the > string. Additionally, if we don't vary based on all the letters in the > string, an attacker can fix the characters that we do use and generate > colliding strings around them. > Still, much more work for the attacker. > Another option to consider would be to apply this change to some but > not all of the rounds. If we added the seed lookup xor operation for > only the first and last 5 values of x, we would still retain much of > the benefit without adding much computational overhead for very long > strings. > I like that. > We could also consider a less computationally expensive operation than > the modulo for calculating the lookup index, like simply truncating to > the correct number of bits. > Sure. Thanks for thinking about all the details here!! -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com