schenksj opened a new issue, #4529: URL: https://github.com/apache/datafusion-comet/issues/4529
### Problem When a native Parquet scan hits a corrupt footer, a truncated/empty file, or a deleted file, Comet rethrows the raw DataFusion / object_store message: - `Parquet error: ...` (corrupt footer etc.) - `Requested range was invalid` (0-byte / truncated file) - `Object at location ... not found` (deleted file) Spark's own reader surfaces these as `FAILED_READ_FILE.NO_HINT` carrying the offending file path, and tests/tools assert on that shape. Comet's native path does **not** go through Spark's `FileScanRDD`, so `InputFileBlockHolder` is usually unpopulated and the path is missing from any wrapped error. ### Proposed fix - `CometExecIterator.isFileReadError` classifies file-read failures by matching those specific IO phrasings -- deliberately **not** the broad `Generic <Store> error:` prefix, which also covers non-file config errors (e.g. `Generic HadoopFileSystem error: Hdfs support is not enabled in this build`) that must surface as-is. - `ShimSparkErrorConverter.wrapNativeParquetError` (in both the spark-3.5 and spark-4.x shims) wraps the cause via `QueryExecutionErrors.cannotReadFilesError(cause, path)`. - Thread per-partition file paths from `CometNativeScanExec` -> `CometNativeExec` / `CometExecRDD` -> `CometExecIterator` so the wrapped error names the actual file, with an `InputFileBlockHolder` fallback for any path that does populate it. ### Relationship to the Delta integration Standalone error-compatibility improvement for all native Parquet scans. It is **required for** the in-progress Delta Lake contrib integration (Delta's corrupt-file / broken-checkpoint suites assert the `FAILED_READ_FILE` message and path), so it would help to prioritize it accordingly. A PR will follow shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
