wiedld opened a new issue, #13679:
URL: https://github.com/apache/datafusion/issues/13679

   ### Is your feature request related to a problem or challenge?
   
   ParquetSink (used for `COPY TO`) encodes bytes to parquet and writes to the 
sink (e.g. object store). It currently does not include retry logic for failed 
multipart PUTs to object store. We feel this is a gap due to exposure to 
network issues.
   
   ### Describe the solution you'd like
   
   Have the ability to automatically retry failed PUTs, which write through the 
ParquetSink. This can be a configurable option.
   
   ### Describe alternatives you've considered
   
   N/A
   
   ### Additional context
   
   _How easy is it to add PUT retry logic to ParquetSink?_ 
   * If we want retry logic on the store-upload part only, that [occurs here in 
the 
ParquetSink](https://github.com/apache/datafusion/blob/fc703238b1d7794bd132a7fb6b97cad9ba4c7446/datafusion/core/src/datasource/file_format/parquet.rs#L1160-L1162).
 I believe the error returned includes the failed upload to object store.
   * Specifically, the `write_all` will use the 
[BufWriter::poll_write](https://github.com/apache/arrow-rs/blob/63ad87a8d79ecc14247297ddf0ff7707d4da284c/object_store/src/buffered.rs#L359-L404)
 which passes through the [object store multipart_put 
errors](https://github.com/apache/arrow-rs/blob/63ad87a8d79ecc14247297ddf0ff7707d4da284c/object_store/src/buffered.rs#L390).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to