RunOrVeith opened a new issue #1544:
URL: https://github.com/apache/libcloud/issues/1544


   ## Summary
   
   The google storage driver crashes with a KeyError in `get_object` because 
google does not send the `content-length` header in some cases.
   
   ```
     File 
"/home/veith/Projects/python-flexi-path/venv/lib/python3.8/site-packages/libcloud/storage/drivers/s3.py",
 line 370, in get_object
       obj = self._headers_to_object(object_name=object_name,
     File 
"/home/veith/Projects/python-flexi-path/venv/lib/python3.8/site-packages/libcloud/storage/drivers/s3.py",
 line 1066, in _headers_to_object
       obj = Object(name=object_name, size=headers['content-length'],
   KeyError: 'content-length'
   
   Process finished with exit code 1
   
   ```
   
   ## Detailed Information
   
   Libcloud version 3.3.0
   Ubuntu 18.04
   Python 3.8
   
   Google Storage does not send the content-length header in the following case:
   
   - File has `content-encoding: gzip`
   - Request header has `Accept-Encoding: gzip`
   - File is **not** publicly accessible (allUsers have read rights inside the 
bucket)
   
   This causes the driver to crash as soon as you call `get_object` on such a 
file, because the HEAD request that is sent does not return `content-length`, 
which is directly queried in `_headers_to_object`.
   
   *Code to reproduce*:
   
   ```python
   import gzip
   from libcloud.storage.providers import Provider, get_driver
   
   
   def create_gzipped_file():
       content = b"I am a test file with super nice content that is so 
interesting to read"
       encoded_bytes = gzip.compress(content)
       file = "example_gzipped_file.txt"
       with open(file, mode="wb") as f:
           f.write(encoded_bytes)
       return file
   
   
   # Set your credentials here
   email = ""  # Add a service account email here 
   key = ""   # Add a key here
   
   # Make sure this container is not publicly readable!
   container_name = "ml-integration-tests"
   
   # Upload a dummy file with gzip encoding
   file_path = create_gzipped_file()
   driver = get_driver(Provider.GOOGLE_STORAGE)(key=email, secret=key)
   container = driver.get_container(container_name=container_name)
   driver.upload_object(file_path=file_path, container=container, 
object_name=file_path,  extra={"content_type": "text/plain"},  
headers={"Content-Encoding": "gzip"})
   
   # KeyError raised here when we try to access the file
   # Note that connection.request sets the headers to accept gzip
   #     headers.update({'Accept-Encoding': 'gzip,deflate'})
   # in common/base.py line 574
   driver.get_object(container_name=container_name, object_name=file_path)
   ```
   
   
   **Workaround**:
   Change `_headers_to_object` to look for `x-goog-stored-content-length` in 
case it can't find `content-length`, because google seems to always send this. 
This is however quite ugly because the way it is right now it would have to be 
done within `s3.py`, which would then also be used with AWS. A short method 
that can me overwritten by the google driver would probably be better:
   The code above works if you make this change:
   
   ```python
      def _headers_to_object(self, object_name, container, headers):
           hash = headers['etag'].replace('"', '')
           extra = {'content_type': headers['content-type'],
                    'etag': headers['etag']}
           meta_data = {}
   
           if 'last-modified' in headers:
               extra['last_modified'] = headers['last-modified']
   
           for key, value in headers.items():
               if not key.lower().startswith(self.http_vendor_prefix + 
'-meta-'):
                   continue
   
               key = key.replace(self.http_vendor_prefix + '-meta-', '')
               meta_data[key] = value
   
           content_length = headers.get('content-length',
                                        
headers.get('x-goog-stored-content-length')
                                        )
           if content_length is None:
               raise KeyError(f"Can not deduce object size from headers for 
{object_name}")
   
           obj = Object(name=object_name, size=content_length,
                        hash=hash, extra=extra,
                        meta_data=meta_data,
                        container=container,
                        driver=self)
           return obj
   ```
   
   We have also [opened an 
issue](https://issuetracker.google.com/issues/177896087) for this within the 
google cloud storage issue tracker, as the [documentation for "Transcoding of 
gzip-comporessed files"](https://cloud.google.com/storage/docs/transcoding) 
reads like `content-length` should be sent in case when the request header 
contains `Accept-Encoding: gzip`.
   Regardless of whether this is a bug in libcloud or google storage, I don't 
think libcloud should crash here.
   
   Let me know if you need to know anything else.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to