Re: unicode and hashlib

2008-12-02 Thread Bryan Olson
Scott David Daniels wrote: Bryan Olson wrote: ... I think that's good behavior, except that the error message is likely to end beginners to look up the obscure buffer interface before they find they just need mystring.decode('utf8') or bytes(mystring, 'utf8'). Oops, careful here (I made this

Re: unicode and hashlib

2008-12-01 Thread Bryan Olson
Jeff H wrote: [...] So once I have character strings transformed internally to unicode objects, I should encode them in 'utf-8' before attempting to do things that guess at the proper way to encode them for further processing.(i.e. hashlib) It looks like hashlib in Python 3 will not even

Re: unicode and hashlib

2008-12-01 Thread Scott David Daniels
Bryan Olson wrote: ... I think that's good behavior, except that the error message is likely to end beginners to look up the obscure buffer interface before they find they just need mystring.decode('utf8') or bytes(mystring, 'utf8'). Oops, careful here (I made this mistake once in this thread

Re: unicode and hashlib

2008-11-30 Thread Scott David Daniels
Jeff H wrote: ... decode vs encode You decode from on character set to a unicode object You encode from a unicode object to a specifed character set Pretty close: encode: Think of characters a conceptual -- you encode a character string into a bunch of bytes (unicode - bytes) in order

Re: unicode and hashlib

2008-11-29 Thread Jeff H
On Nov 28, 1:24 pm, Scott David Daniels [EMAIL PROTECTED] wrote: Jeff H wrote: hashlib.md5 does not appear to like unicode,   UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in position 1650: ordinal not in range(128) After googling, I've found BDFL and others on Py3K

Re: unicode and hashlib

2008-11-29 Thread Jeff H
On Nov 28, 2:03 pm, Terry Reedy [EMAIL PROTECTED] wrote: Jeff H wrote: hashlib.md5 does not appear to like unicode,   UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in position 1650: ordinal not in range(128) It is the (default) ascii encoder that does not like non-ascii

Re: unicode and hashlib

2008-11-29 Thread Jeff H
On Nov 29, 8:27 am, Jeff H [EMAIL PROTECTED] wrote: On Nov 28, 2:03 pm, Terry Reedy [EMAIL PROTECTED] wrote: Jeff H wrote: hashlib.md5 does not appear to like unicode,   UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in position 1650: ordinal not in range(128)

Re: unicode and hashlib

2008-11-29 Thread Marc 'BlackJack' Rintsch
On Sat, 29 Nov 2008 06:51:33 -0800, Jeff H wrote: Actually, what I am surprised by, is the fact that hashlib cares at all about the encoding. A md5 hash can be produced for an .iso file which means it can handle bytes, why does it care what it is being fed, as long as there are bytes. But

Re: unicode and hashlib

2008-11-29 Thread Scott David Daniels
Jeff H wrote: ... Actually, what I am surprised by, is the fact that hashlib cares at all about the encoding. A md5 hash can be produced for an .iso file which means it can handle bytes, why does it care what it is being fed, as long as there are bytes. I would have assumed that it would take

Re: unicode and hashlib

2008-11-29 Thread Scott David Daniels
Scott David Daniels wrote: ... If you now, and for all time, decide that the only source you will take is cp1252, perhaps you should decode to cp1252 before hashing. Of course my dyslexia sticks out here as I get encode and decode exactly backwards -- Marc 'BlackJack' Rintsch has it right.

Re: unicode and hashlib

2008-11-29 Thread Jeff H
On Nov 29, 12:23 pm, Scott David Daniels [EMAIL PROTECTED] wrote: Scott David Daniels wrote: ... If you now, and for all time, decide that the only source you will take is cp1252, perhaps you should decode to cp1252 before hashing. Of course my dyslexia sticks out here as I get encode

unicode and hashlib

2008-11-28 Thread Jeff H
hashlib.md5 does not appear to like unicode, UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in position 1650: ordinal not in range(128) After googling, I've found BDFL and others on Py3K talking about the problems of hashing non-bytes (i.e. buffers)

Re: unicode and hashlib

2008-11-28 Thread Scott David Daniels
Jeff H wrote: hashlib.md5 does not appear to like unicode, UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in position 1650: ordinal not in range(128) After googling, I've found BDFL and others on Py3K talking about the problems of hashing non-bytes (i.e. buffers) ...

Re: unicode and hashlib

2008-11-28 Thread MRAB
Jeff H wrote: hashlib.md5 does not appear to like unicode, UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in position 1650: ordinal not in range(128) After googling, I've found BDFL and others on Py3K talking about the problems of hashing non-bytes (i.e. buffers)

Re: unicode and hashlib

2008-11-28 Thread Terry Reedy
Jeff H wrote: hashlib.md5 does not appear to like unicode, UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in position 1650: ordinal not in range(128) It is the (default) ascii encoder that does not like non-ascii chars. I suspect that is you encode to bytes first with an

Re: unicode and hashlib

2008-11-28 Thread Paul Boddie
On 28 Nov, 21:03, Terry Reedy [EMAIL PROTECTED] wrote: It is the (default) ascii encoder that does not like non-ascii chars. I suspect that is you encode to bytes first with an encoder that does work (latin-???), md5 will be happy. I know that the Python roadmap answer to such questions might