schenksj commented on PR #4536: URL: https://github.com/apache/datafusion-comet/pull/4536#issuecomment-4587466215
Thanks for the thorough review, @andygrove — and for pulling the CI logs and tracing each failure. All points addressed: **1. spark-3.4 shim.** Added the missing `CannotReadFile` case (3.4's `cannotReadFilesError(Throwable, String)` matches 3.5; the `FileNotFound` case was already present on 3.4). **2. Version-dependent message assertions.** The two `SparkErrorConverterSuite` tests and the `CometExecSuite` e2e test now assert on the version-stable message — `"Encountered error while reading file …"` — instead of the `FAILED_READ_FILE` literal, which only Spark 4.x prepends to `getMessage`. (Went with the stable substring over a `getErrorClass`/`getCondition` test shim; glad to switch if you'd rather assert the error class.) **3. ignoreCorruptFiles / ignoreMissingFiles.** Two parts: - *Behavior:* no gap — `CometNativeScan` already declines the native scan and falls back to Spark when either config is enabled (`CometNativeScan.scala:75-87`), so Spark's skip semantics are preserved. - *Wording:* `cannot_read_file_message` now appends Spark's phrasing (`"is not a Parquet file"`) for parquet-rs's magic/footer error (`"Invalid Parquet file. Corrupt footer"`), so `ParquetQuerySuite`'s assertion matches. The outer `cannotReadFilesError` wrapper is unchanged, so it composes with Spark's tests and doesn't disturb the existing message expectations. **4. `IoError` scoping.** Dropped the unconditional `IoError` arm — scans surface read failures as typed `ParquetError`/`ObjectStore`, so a bare `IoError` (spill/shuffle) is no longer relabelled `FAILED_READ_FILE`. **5. Double `try_classify` call.** Tidied — the DataFusion arm is a single `if let Some(..)`, with the generic fallback extracted to `throw_generic_exception`. **Also** (a separate case surfaced by the same area): a missing file — an `object_store` `NotFound`, including when wrapped by the parquet reader as `ParquetError::External` — now classifies as `FileNotFound` → `readCurrentFileNotFoundError`, matching what Spark does for a file deleted between planning and execution. This fixes Delta's `DeltaVacuumSuite` "vacuum for cdc" suites. Added Rust classifier unit tests for the NotFound / wrapped-NotFound, corrupt-vs-missing, IoError, and magic-footer-wording cases. Verified locally on Spark 4.x and Spark 3.5 / Delta 3.3.2 (the `DeltaVacuumSuite` "vacuum for cdc" tests now pass). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
