maubarsom opened a new issue, #37888:
URL: https://github.com/apache/arrow/issues/37888
### Describe the bug, including details regarding any error messages,
version, and platform.
Bug seen in pyarrow version 12.0.0 on macOS Ventura 13.6, Apple M1 Pro.
# Description
The error was detected in `pandas` originally, but traced to `pyarrow`, as
described in the screenshot. Basically, if I try to read an existing file from
`S3` when my credentials are stored in the ~/.aws/credentials and config
directory, pyarrow returns the error .
```
OSError: When getting information for key 'XXX/YYY.parquet' in bucket
'ZZZZZ': AWS Error ACCESS_DENIED during HeadObject operation: No response body.
```
**Expected result**: The file is succesfully read
**Note:** This error DOES NOT occur if the credentials are set as
environment variables (instead of being read from ~/.aws/credentials). If they
are set as env variables, pyarrow succesfully reads the parquet file.
**Note 2:** As shown in the screenshot, I managed to circunvent the issue in
**pandas** by passing the `storage_options={"anon":False}` explicitly. However,
trying a similar approach in `pyarrow`, by setting explicitly
`filesystem=S3Filesystem(anonymous=False)` did not succeed, and resulted in the
same error.
# Screenshot

The traceback:
```
File
~/.mambaforge/envs/datasci/lib/python3.11/site-packages/pyarrow/parquet/core.py:2939,
in read_table(source, columns, use_threads, metadata, schema,
use_pandas_metadata, read_dictionary, memory_map, buffer_size, partitioning,
filesystem, filters, use_legacy_dataset, ignore_prefixes, pre_buffer,
coerce_int96_timestamp_unit, decryption_properties, thrift_string_size_limit,
thrift_container_size_limit)
2932 raise ValueError(
2933 "The 'metadata' keyword is no longer supported with the new "
2934 "datasets-based implementation. Specify "
2935 "'use_legacy_dataset=True' to temporarily recover the old "
2936 "behaviour."
2937 )
2938 try:
-> 2939 dataset = _ParquetDatasetV2(
2940 source,
2941 schema=schema,
2942 filesystem=filesystem,
2943 partitioning=partitioning,
2944 memory_map=memory_map,
2945 read_dictionary=read_dictionary,
2946 buffer_size=buffer_size,
2947 filters=filters,
2948 ignore_prefixes=ignore_prefixes,
2949 pre_buffer=pre_buffer,
2950 coerce_int96_timestamp_unit=coerce_int96_timestamp_unit,
2951 thrift_string_size_limit=thrift_string_size_limit,
2952 thrift_container_size_limit=thrift_container_size_limit,
2953 )
2954 except ImportError:
2955 # fall back on ParquetFile for simple cases when pyarrow.dataset
2956 # module is not available
2957 if filters is not None:
File
~/.mambaforge/envs/datasci/lib/python3.11/site-packages/pyarrow/parquet/core.py:2465,
in _ParquetDatasetV2.__init__(self, path_or_paths, filesystem, filters,
partitioning, read_dictionary, buffer_size, memory_map, ignore_prefixes,
pre_buffer, coerce_int96_timestamp_unit, schema, decryption_properties,
thrift_string_size_limit, thrift_container_size_limit, **kwargs)
2463 except ValueError:
2464 filesystem = LocalFileSystem(use_mmap=memory_map)
-> 2465 finfo = filesystem.get_file_info(path_or_paths)
2466 if finfo.is_file:
2467 single_file = path_or_paths
File
~/.mambaforge/envs/datasci/lib/python3.11/site-packages/pyarrow/_fs.pyx:571, in
pyarrow._fs.FileSystem.get_file_info()
File
~/.mambaforge/envs/datasci/lib/python3.11/site-packages/pyarrow/error.pxi:144,
in pyarrow.lib.pyarrow_internal_check_status()
File
~/.mambaforge/envs/datasci/lib/python3.11/site-packages/pyarrow/error.pxi:115,
in pyarrow.lib.check_status()
```
### Component(s)
Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]