thinkharderdev opened a new issue, #6476:
URL: https://github.com/apache/arrow-datafusion/issues/6476

   ### Is your feature request related to a problem or challenge?
   
   If you are scanning a large number of files then a single misssing/corrupt 
file could cause the entire query to fail. This can happen in a number of 
different scenarios:
   1. File is deleted between the time of listing and time of scan
   2. Listing is done from a catalog which is only eventually consistent with 
underlying data source
   3. A partial/failed write creates a file which is unreadable. 
   
   ### Describe the solution you'd like
   
   In some cases, failing the entire query is the desired result, but in other 
cases it would be better to simply move on and scan what we can (and surface 
the failed scans as a metric for visibility). 
   
   `FileStream` can take an enum:
   ```
   /// Describes the behavior of the `FileStrem` if file opening or scanning 
fails
   enum OnError {
     /// Continue scanning, ignoring the failed file
     Skip,
     /// Fail the entire stream and return the underlying
     Fail
   }
   ```
   
   For our use case it's not really necessary but it might be useful as well to 
decompose this into different error handling configs for failed opening and 
failed scanning. 
   
   ### Describe alternatives you've considered
   
   We could leave the current behavior unchanged
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to