[
https://issues.apache.org/jira/browse/ARROW-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628366#comment-17628366
]
Joris Van den Bossche edited comment on ARROW-17985 at 11/3/22 2:55 PM:
------------------------------------------------------------------------
In addition to better docs (ARROW-18238), we could maybe try to improve the
error message by specifically checking for the case of a mismatching bucket
when listing the bucket gives an error.
The current code is basically the following (with my suggestion added in
comments):
{code:cpp}
auto outcome = impl_->client_->HeadBucket(req);
if (!outcome.IsSuccess()) {
if (!IsNotFound(outcome.GetError())) {
// <--- here, if we are going to raise an error, we could check
impl_->client_->GetBucketRegion(..)
// with impl_->options().region and if not matching, adapt the
error message to hint at this?
return ErrorToStatus(
std::forward_as_tuple("When getting information for bucket '",
path.bucket,
"': "),
"HeadBucket", outcome.GetError());
}
info.set_type(FileType::NotFound);
return info;
}
{code}
was (Author: jorisvandenbossche):
In addition to better docs (ARROW-18238), we could maybe try to improve the
error message by specifically checking for the case of a mismatching bucket
when listing the bucket gives an error.
The current code is basically the following (with my suggestion added in
comments):
{code:cpp}
auto outcome = impl_->client_->HeadBucket(req);
if (!outcome.IsSuccess()) {
if (!IsNotFound(outcome.GetError())) {
// <--- here, if we are going to raise an error, we could check
impl_->client_->GetBucketRegion(..) with impl_->options().region
// and if not matching, adapt the error message to hint at this?
return ErrorToStatus(
std::forward_as_tuple("When getting information for bucket '",
path.bucket,
"': "),
"HeadBucket", outcome.GetError());
}
info.set_type(FileType::NotFound);
return info;
}
{code}
> [Python][C++] Opaque error code ([code: 100]), when not setting region
> ----------------------------------------------------------------------
>
> Key: ARROW-17985
> URL: https://issues.apache.org/jira/browse/ARROW-17985
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Reporter: Vedant Roy
> Priority: Minor
>
> A few odd things are going on with the Python bindings:
> # Statefulness. I ran the following code:
> {code:java}
> import os
> import pyarrow.fs as arrow_fs
> def fs_():
> s3_fs = arrow_fs.S3FileSystem(
> access_key="<token>",
> secret_key="<token>",
> endpoint_override="<cloudflare r2 url>",
> )
> return s3_fs
> fs = fs_()
> print(fs.get_file_info("data"))
> {code}
> and it worked on one machine but not the other. Only setting
> {code:java}
> region="auto"
> {code}
> allowed the code to work consistently on both computers.
> Furthermore, the error message is very opaque:
> {code:java}
> Traceback (most recent call last):
> File "cluster_scripts/test_s3.py", line 51, in <module>
> print(fs.get_file_info("data"))
> File "pyarrow/_fs.pyx", line 439, in pyarrow._fs.FileSystem.get_file_info
> File "pyarrow/error.pxi", line 143, in
> pyarrow.lib.pyarrow_internal_check_status
> File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status
> OSError: When getting information for bucket 'data': AWS Error [code 100]: No
> response body.
> {code}
> Googling this error gives no information whatsoever. I managed to figure out
> the issue by switching from Cloudflare to S3, and when the issue was still
> going on, I explicitly set a region, but the experience was pretty painful.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)