luispcunha opened a new issue, #36725:
URL: https://github.com/apache/arrow/issues/36725
### Describe the bug, including details regarding any error messages,
version, and platform.
Using seek to end of file with files in a Google Cloud Storage filesystem
doesn't work.
Code to reproduce the error:
```python
from pyarrow import fs
import zipfile
filesystem = fs.GcsFileSystem()
with filesystem.open_input_file("bucket/archive.zip") as f:
size = f.size()
f.seek(-1, 2) # works
f.seek(0, 2) # fails
```
Traceback:
```python
Traceback (most recent call last):
File ".../test.py", line 18, in <module>
zip = zipfile.ZipFile(f)
^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/zipfile.py", line 1302, in __init__
self._RealGetContents()
File ".../lib/python3.11/zipfile.py", line 1365, in _RealGetContents
endrec = _EndRecData(fp)
^^^^^^^^^^^^^^^
File ".../lib/python3.11/zipfile.py", line 292, in _EndRecData
fpin.seek(0, 2)
File "pyarrow/io.pxi", line 323, in pyarrow.lib.NativeFile.seek
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: google::cloud::Status(OUT_OF_RANGE: Permanent
error ReadObjectNotWrapped: <?xml version='1.0'
encoding='UTF-8'?><Error><Code>InvalidRange</Code><Message>The requested range
cannot be satisfied.</Message><Details>bytes=485-</Details></Error>)
```
The behavior is the same in several zip files I've tried. Performing the
same operation with the same zip file on a local file system works as expected.
I came across this while trying to use the `zipfile` module on a Google
Cloud Storage filesystem file, because `zipfile` calls `f.seek(0, 2)` to
determine the size of the file. My end goal is to achieve something as follows:
```python
from pyarrow import fs
import zipfile
filesystem = fs.GcsFileSystem()
with filesystem.open_input_file("bucket/archive.zip") as f:
archive = zipfile.ZipFile(f)
members = zip.namelist()
```
**Environment:** PyArrow 12.0.1, Python 3.11, macOS X
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]