Russell Keith-Magee created LIBCLOUD-233:
--------------------------------------------
Summary: Atmos storage driver doesn't correctly encode path names
Key: LIBCLOUD-233
URL: https://issues.apache.org/jira/browse/LIBCLOUD-233
Project: Libcloud
Issue Type: Bug
Components: Storage
Affects Versions: 0.10.1
Environment: Python 2.7.1
Reporter: Russell Keith-Magee
Priority: Critical
If you use the Atmos storage driver, and you attempt to stream the upload of an
object, and either your container name or your object name is a unicode string,
the presence of these unicode strings will cause the HTTP message body to be
converted into a unicode string.
However, file content is provided as a byte string; if the file content
contains binary data, httplib will try to convert this file content into
unicode, yielding encoding errors.
For example, if you try to stream upload a PDF whose name is stored as
u'foo.pdf', you'll get a message something like:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 10:
ordinal not in range(128)
(Position 10 is where the binary content in a PDF starts, after the
"%PDF-1.3\n%" header)
The behaviour of httplib in the presence of unicode content is a known issue
(http://bugs.python.org/issue12398); All path tokens should be encoded as ascii
before being passed to httplib to prevent this problem occurring.
As far as I can make out, this problem only exists under Python 2.7 -- I've
observed it on Python 2.7.1 and Python 2.7.3. Python 2.6.7 is unaffected.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira