thvasilo opened a new issue, #37001: URL: https://github.com/apache/arrow/issues/37001
### Describe the bug, including details regarding any error messages, version, and platform. I'm trying to use a [localstack](https://docs.localstack.cloud/getting-started/quickstart/)-created S3 bucket as way to test my application without interacting with S3. To do that I launch an S3 endpoint using `localstack start -d` and create my PyArrow S3FS using: ``` s3_fs = fs.S3FileSystem(endpoint_override="localhost:4566") ``` When I try interacting with files on the simulated bucket however I get the following: ``` In [223]: nrows = pq.read_metadata(f"{file_bucket}/{file_key}", filesystem=s3_fs).num_rows --------------------------------------------------------------------------- OSError Traceback (most recent call last) <ipython-input-223-a51dff0bbcaa> in <module> ----> 1 nrows = pq.read_metadata(f"{file_bucket}/{file_key}", filesystem=s3_fs).num_rows /[...]/lib/python3.7/site-packages/pyarrow/parquet/core.py in read_metadata(where, memory_map, decryption_properties, filesystem) 3479 file_ctx = nullcontext() 3480 if filesystem is not None: -> 3481 file_ctx = where = filesystem.open_input_file(where) 3482 3483 with file_ctx: [...]/python3.7/site-packages/pyarrow/_fs.pyx in pyarrow._fs.FileSystem.open_input_file() [...]/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() [...]/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() OSError: When reading information for key 'redacted/path/to/file' in bucket 'example-bucket': AWS Error NETWO RK_CONNECTION during HeadObject operation: curlCode: 60, SSL peer certificate or SSH remote key was not OK ``` Another user seems to have the same problem when using on-prem S3, and had to use `s3fs` along with `PyFileSystem, FSSpecHandler` to resolve it: https://discuss.ray.io/t/ssl-peer-certificate-or-ssh-remote-key-was-not-ok/11091/2 Fully reproducible example: ``` pip install localstack awscli-local pyarrow localstack start -d awslocal s3 mb example-bucket python <<HEREDOC import numpy as np import pandas as pd import pyarrow as pa import pyarrow.parquet as pq df = pd.DataFrame({'one': [-1, np.nan, 2.5], 'two': ['foo', 'bar', 'baz'], 'three': [True, False, True]}, index=list('abc')) table = pa.Table.from_pandas(df) pq.write_table(table, 'example.parquet') HEREDOC awslocal s3 cp example.parquet s3://example-bucket/ python <<HEREDOC from pyarrow import fs import pyarrow.parquet as pq s3_fs = fs.S3FileSystem(endpoint_override="localhost:4566") pq.read_metadata("example-bucket/example.parquet", filesystem=s3_fs) HEREDOC ``` Would result in: ``` Traceback (most recent call last): File "<stdin>", line 4, in <module> File "[.../]lib/python3.9/site-packages/pyarrow/parquet/core.py", line 3481, in read_metadata file_ctx = where = filesystem.open_input_file(where) File "pyarrow/_fs.pyx", line 770, in pyarrow._fs.FileSystem.open_input_file File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status OSError: When reading information for key 'example.parquet' in bucket 'example-bucket': AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 60, SSL peer certificate or SSH remote key was not OK ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
