Richard Xia created LIBCLOUD-903:
------------------------------------

             Summary: AWS S3 upload_object_via_stream fails on non-file 
iterable due to missing Content-Length header
                 Key: LIBCLOUD-903
                 URL: https://issues.apache.org/jira/browse/LIBCLOUD-903
             Project: Libcloud
          Issue Type: Bug
            Reporter: Richard Xia


The issue I am seeing appears to be due to the incorrect integration of 4 
separate libraries, but I believe the real problem is here in libcloud, in the 
{{upload_object_via_stream()}} method on the S3 storage driver.
                                                                                
 
I am using Python 3.5.1 and the the four libraries I am using are:              
 
                                                                                
 
* Django 1.10.6                                                                 
 
* django-storages 1.5.2                                                         
 
* libcloud v2.0.0rc1-tentative                                                  
 
* requests 2.13.0                                                               
 
                                                                                
 
Specifically, when I try to use a Django 
[ContentFile|https://docs.djangoproject.com/en/1.10/ref/files/file/#django.core.files.base.ContentFile],
 Django's own file-like wrapper for strings, to save a new file to S3 via the 
Libcloud backend of django-storages, I get the following error:
                                                                                
 
{code:xml}                                                                      
 
<?xml version="1.0" 
encoding="UTF-8"?>\n<Error><Code>NotImplemented</Code><Message>A header you 
provided implies functionality that is not 
implemented</Message><Header>Transfer-Encoding</Header><RequestId>A2FC4D5109083076</RequestId><HostId>K9WGhd18iqQHyIyv+GxWcxHexvapVSidTtHzSqujtT9nT5LhmIEygMKOfR/7F0v7ujnlE/CoYiM=</HostId></Error>
{code}                                                                          
 
                                                                                
 
The reason this happens is because Libcloud is generating an HTTP request to 
AWS S3 that is missing the {{Content-Length}} header. AWS S3 requires the 
{{Content-Length}} header for file uploads *unless* if it is a multi-part 
upload. This is why this used to work on the 1.5.0 release of {{libcloud}}, 
because even single-part uploads were done as a one-part multi-part upload.
                                                                                
 
I've traced my bug down through all four libraries and have determined exactly 
why the {{Content-Length}} header is missing in my particular use case. The 
{{upload_object_via_stream()}} has an {{iterator}} argument that should yield 
the content body data, and it eventually passes that argument directly to the 
{{requests}} library. The {{requests}} library will actually [try very hard to 
add the {{Content-Length}} 
header|https://github.com/kennethreitz/requests/blob/c43fefa7ed535c41ba7d58021f0f16ed5ba1d584/requests/models.py#L471],
 even for certain types of iterator streams. In particular it can determine the 
length of file-like objects which support stat operations and it can handle 
StringIO/BytesIO objects. However, the Django {{ContentFile}} is neither, and 
{{requests}} cannot extract the length of the stream without consuming the 
iterator, so it does not try.
                                                                                
 
                                                                                
 
Here's some (Python 3) code to demonstrate the bug:                             
 
                                                                                
 
{code:python}                                                                   
 
from io import BytesIO                                                          
 
                                                                                
 
class MyWrapper(object):                                                        
 
    """A contrived wrapper that acts similar to BytesIO."""                     
 
    def __init__(self, content):                                                
 
        self.content = BytesIO(content)                                         
 
                                                                                
 
    def __iter__(self):                                                         
 
        self.content.seek(0)                                                    
 
        yield self.content.read()                                               
 
                                                                                
 
                                                                                
 
# Assume driver is already set to some S3 provider w/ credentials               
 
container = driver.get_container(container_name='my-container')                 
 
driver.upload_object_via_stream(iterator=iter(MyWrapper(b'hello world')),       
 
                                container=container,                            
 
                                object_name='my_file.txt')                      
 
{code}                                                                          
 
                                                                                
 
I think the proper solution to this will require all calls to the S3 
{{upload_object_via_stream()}} to use the multi-part uploader in order to 
eschew the need for the {{Content-Length}} header. If desired, you could make 
the same optimizations that the request library makes by checking for certain 
common cases where you do know the file size and only using the multi-part 
uploader when necessary.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to