RunOrVeith opened a new issue #1544:
URL: https://github.com/apache/libcloud/issues/1544
## Summary
The google storage driver crashes with a KeyError in `get_object` because
google does not send the `content-length` header in some cases.
```
File
"/home/veith/Projects/python-flexi-path/venv/lib/python3.8/site-packages/libcloud/storage/drivers/s3.py",
line 370, in get_object
obj = self._headers_to_object(object_name=object_name,
File
"/home/veith/Projects/python-flexi-path/venv/lib/python3.8/site-packages/libcloud/storage/drivers/s3.py",
line 1066, in _headers_to_object
obj = Object(name=object_name, size=headers['content-length'],
KeyError: 'content-length'
Process finished with exit code 1
```
## Detailed Information
Libcloud version 3.3.0
Ubuntu 18.04
Python 3.8
Google Storage does not send the content-length header in the following case:
- File has `content-encoding: gzip`
- Request header has `Accept-Encoding: gzip`
- File is **not** publicly accessible (allUsers have read rights inside the
bucket)
This causes the driver to crash as soon as you call `get_object` on such a
file, because the HEAD request that is sent does not return `content-length`,
which is directly queried in `_headers_to_object`.
*Code to reproduce*:
```python
import gzip
from libcloud.storage.providers import Provider, get_driver
def create_gzipped_file():
content = b"I am a test file with super nice content that is so
interesting to read"
encoded_bytes = gzip.compress(content)
file = "example_gzipped_file.txt"
with open(file, mode="wb") as f:
f.write(encoded_bytes)
return file
# Set your credentials here
email = "" # Add a service account email here
key = "" # Add a key here
# Make sure this container is not publicly readable!
container_name = "ml-integration-tests"
# Upload a dummy file with gzip encoding
file_path = create_gzipped_file()
driver = get_driver(Provider.GOOGLE_STORAGE)(key=email, secret=key)
container = driver.get_container(container_name=container_name)
driver.upload_object(file_path=file_path, container=container,
object_name=file_path, extra={"content_type": "text/plain"},
headers={"Content-Encoding": "gzip"})
# KeyError raised here when we try to access the file
# Note that connection.request sets the headers to accept gzip
# headers.update({'Accept-Encoding': 'gzip,deflate'})
# in common/base.py line 574
driver.get_object(container_name=container_name, object_name=file_path)
```
**Workaround**:
Change `_headers_to_object` to look for `x-goog-stored-content-length` in
case it can't find `content-length`, because google seems to always send this.
This is however quite ugly because the way it is right now it would have to be
done within `s3.py`, which would then also be used with AWS. A short method
that can me overwritten by the google driver would probably be better:
The code above works if you make this change:
```python
def _headers_to_object(self, object_name, container, headers):
hash = headers['etag'].replace('"', '')
extra = {'content_type': headers['content-type'],
'etag': headers['etag']}
meta_data = {}
if 'last-modified' in headers:
extra['last_modified'] = headers['last-modified']
for key, value in headers.items():
if not key.lower().startswith(self.http_vendor_prefix +
'-meta-'):
continue
key = key.replace(self.http_vendor_prefix + '-meta-', '')
meta_data[key] = value
content_length = headers.get('content-length',
headers.get('x-goog-stored-content-length')
)
if content_length is None:
raise KeyError(f"Can not deduce object size from headers for
{object_name}")
obj = Object(name=object_name, size=content_length,
hash=hash, extra=extra,
meta_data=meta_data,
container=container,
driver=self)
return obj
```
We have also [opened an
issue](https://issuetracker.google.com/issues/177896087) for this within the
google cloud storage issue tracker, as the [documentation for "Transcoding of
gzip-comporessed files"](https://cloud.google.com/storage/docs/transcoding)
reads like `content-length` should be sent in case when the request header
contains `Accept-Encoding: gzip`.
Regardless of whether this is a bug in libcloud or google storage, I don't
think libcloud should crash here.
Let me know if you need to know anything else.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]