thvasilo opened a new issue, #37260:
URL: https://github.com/apache/arrow/issues/37260

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   We have a process that reads the metadata for multiple files using 
`joblib`'s `threading` backend. Occasionally we observe the following error:
   
   ```
   Traceback (most recent call last):  
   [rest of call stack]
   in get_rows_for_parquet_file    nrows = pq.read_metadata(  File 
"/usr/local/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 3481, 
   in read_metadata    file_ctx = where = filesystem.open_input_file(where)  
File "pyarrow/_fs.pyx", line 770, 
   in pyarrow._fs.FileSystem.open_input_file  File "pyarrow/error.pxi", line 
144, 
   in pyarrow.lib.pyarrow_internal_check_status  File "pyarrow/error.pxi", line 
115, 
   in pyarrow.lib.check_status
   
   OSError: When reading information for key 'path/to/my/file' in bucket 
'my-bucket': AWS Error UNKNOWN (HTTP status 503) during HeadObject operation: 
No response body.
   ``` 
   
   I've tried setting the number of retries to 10 when creating the 
S3Filesystem but we still occasionally get this error. I would expect such an 
error if the file did not exist, but that's not the case here, so I'm wondering 
if a different error (like `503 S3 Slow Down`) is being hidden by the system.
   
   Any ideas what could be the root case for such an error and if it's possible 
to surface it?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to