Chris,
I entirely agree. The same questioner also asked about the fastest data type to use as a key in a dictionary; and which data structure is fastest. I get the impression the person is very into micro-optimization, without profiling their application. It seems every choice is made based on the speed of that operation; without consideration of how often that operation is used.

On 17/05/18 09:16, Chris Angelico wrote:
On Thu, May 17, 2018 at 5:21 PM, Anthony Flury via Python-Dev
<python-dev@python.org> wrote:
Victor,
Thanks for the link, but to be honest it will just confuse people - neither
the link or the related bpo entries state that the fix is only limited to
strings. They simply talk about hash randomization - which in my opinion
implies ALL hash algorithms; which is why I asked the question.

I am not sure how much should be exposed about the scope of security fixes
but you can understand my (and other's) confusion.

I am aware that applications shouldn't make assumptions about the value of
any given hash value - apart from some simple assumptions based hash value
equality (i.e. if two objects have different hash values they can't be the
same value).
The hash values of Python objects are calculated by the __hash__
method, so arbitrary objects can do what they like, including
degenerate algorithms such as:

class X:
     def __hash__(self): return 7
Agreed - I should have said the default hash algorithm. Hashes for custom object are entirely application dependent.

So it's impossible to randomize ALL hashes at the language level. Only
str and bytes hashes are randomized, because they're the ones most
likely to be exploitable - for instance, a web server will receive a
query like "http://spam.example/target?a=1&b=2&c=3"; and provide a
dictionary {"a":1, "b":2, "c":3}. Similarly, a JSON decoder is always
going to create string keys in its dictionaries (JSON objects). Do you
know of any situation in which an attacker can provide the keys for a
dict/set as integers?
I was just asking the question - rather than critiquing the fault-fix. I am actually more concerned that the documentation relating to the fix doesn't make it clear that only strings have their hashes randomised.

/B//TW : //
//
//This question was prompted by a question on a social media platform about
the whether hash values are transferable between across platforms.
Everything I could find stated that after Python 3.3 ALL hash values were
randomized - but that clearly isn't the case; and the original questioner
identified that some hash values are randomized and other aren't.//
/
That's actually immaterial. Even if the hashes weren't actually
randomized, you shouldn't be making assumptions about anything
specific in the hash, save that *within one Python process*, two equal
values will have equal hashes (and therefore two objects with unequal
hashes will not be equal).
Entirely agree - I was just trying to get to the bottom of the difference - especially considering that the documentation I could find implied that all hash algorithms had been randomized.
//I did suggest strongly to the original questioner that relying on the same
hash value across different platforms wasn't a clever solution - their
original plan was to store hash values in a cross system database to enable
quick retrieval of data (!!!). I did remind the OP that a hash value wasn't
guaranteed to be unique anyway - and they might come across two different
values with the same hash - and no way to distinguish between them if all
they have is the hash. Hopefully their revised design will store the key,
not the hash./
Uhh.... if you're using a database, let the database do the work of
being a database. I don't know what this "cross system database" would
be implemented in, but if it's a proper multi-user relational database
engine like PostgreSQL, it's already going to have way better indexing
than anything you'd do manually. I think there are WAY better
solutions than worrying about Python's inbuilt hashing.
Agreed
If you MUST hash your data for sharing and storage, the easiest
solution is to just use a cryptographic hash straight out of
hashlib.py.
As stated before - I think the original questioner was intent on micro optimizations - and they had hit on the idea that storing an integer would be quicker than storing as string - entirely ignoring both the practicality of trying to code all strings into a value (since hashes aren't guaranteed not to collide), and the issues of trying to reverse that translation once the stored key had been retrieved.
ChrisA
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/anthony.flury%40btinternet.com

Thanks for your comments :-)

--
--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury <https://twitter.com/TonyFlury/>*

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to