thinkharderdev opened a new issue, #6476:
URL: https://github.com/apache/arrow-datafusion/issues/6476
### Is your feature request related to a problem or challenge?
If you are scanning a large number of files then a single misssing/corrupt
file could cause the entire query to fail. This can happen in a number of
different scenarios:
1. File is deleted between the time of listing and time of scan
2. Listing is done from a catalog which is only eventually consistent with
underlying data source
3. A partial/failed write creates a file which is unreadable.
### Describe the solution you'd like
In some cases, failing the entire query is the desired result, but in other
cases it would be better to simply move on and scan what we can (and surface
the failed scans as a metric for visibility).
`FileStream` can take an enum:
```
/// Describes the behavior of the `FileStrem` if file opening or scanning
fails
enum OnError {
/// Continue scanning, ignoring the failed file
Skip,
/// Fail the entire stream and return the underlying
Fail
}
```
For our use case it's not really necessary but it might be useful as well to
decompose this into different error handling configs for failed opening and
failed scanning.
### Describe alternatives you've considered
We could leave the current behavior unchanged
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]