Philipp Moritz created ARROW-17079:
--------------------------------------

             Summary: Improve error message propagation from AWS SDK
                 Key: ARROW-17079
                 URL: https://issues.apache.org/jira/browse/ARROW-17079
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
    Affects Versions: 8.0.0
            Reporter: Philipp Moritz


Dear all,

I'd like to see if there is interest to improve the error messages that 
originate from the AWS SDK. Especially for loading datasets from S3, there are 
many things that can go wrong and the error messages that (Py)Arrow gives are 
not always the most actionable, especially if the call involves many different 
SDK functions. In particular, it would be great to have the following attached 
to each error message:
 * A machine parseable status code from the AWS SDK
 * Information as to exactly which AWS SDK call failed, so it can be 
disambiguated for Arrow API calls that use multiple AWS SDK calls

In the ideal case, as a developer I could reconstruct the AWS SDK call that 
failed from the error message (e.g. in a form the allows me to run the API call 
via the "aws" CLI program) so I can debug errors and see how they relate to my 
AWS infrastructure. Any progress in this direction would be super helpful.

 

For context: I recently was debugging some permissioning issues in S3 based on 
the current error codes and it was pretty hard to figure out what was going on 
(see [https://github.com/ray-project/ray/issues/19799#issuecomment-1185035602).]

 

I'm happy to take a stab at this problem but might need some help. Is 
implementing a custom StatusDetail class for AWS errors and propagating errors 
that way the right hunch here? 
[https://github.com/apache/arrow/blob/50f6fcad6cc09c06e78dcd09ad07218b86e689de/cpp/src/arrow/status.h#L110]

 

All the best,

Philipp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to