On 8/27/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 8/26/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > But I'm wondering if passing a Unicode string to the various hash > > digest functions should work at all! Hashes are defined on sequences > > of bytes, and IMO we should insist on the user to pass us bytes, and > > not second-guess what to do with Unicode. > > Conceptually, unicode *by itself* can't be represented as a buffer. > > What can be represented is a unicode string + an encoding. The > question is whether the hash function needs to know the encoding to > figure out the hash. > > If you're hashing arbitrary bytes, then it doesn't really matter -- > there is no expectation that a recoding should have the same hash. > > For hashing as a shortcut to __ne__, it does matter for text. > > Unfortunately, for historical reasons, plenty of code grabs the string > buffer expecting text.
Such code is broken, and this will be an error soon. I think this handles all the other issues -- as promised, *any* operation that mixes str and bytes (or anything else supporting the buffer API) will fail with a TypeError unless an encoding is specified explicitly. > For dict comparisons, we really ought to specify the equality (and > therefore hash) in terms of a canonical equivalent, encoded in X (It > isn't clear to me that X should be UTF-8 in particular, but the main > thing is to pick something.) No, dict keys can't be bytes or buffers. > The alternative is that defensive code will need to do a (normally > useless boilerplate) decode/canonicalize/reencode dance before > dictionary checks and insertions. > > I would rather see that boilerplate done once in the unicode type (and > again in any equivalent types, if need be), because > (1) most storage type/encodings would be able to take shortcuts. > (2) if people don't do the defensive coding, the bugs will be very obscure There is no dance. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com