schenksj commented on PR #4536:
URL: 
https://github.com/apache/datafusion-comet/pull/4536#issuecomment-4587466215

   Thanks for the thorough review, @andygrove — and for pulling the CI logs and 
tracing each failure. All points addressed:
   
   **1. spark-3.4 shim.** Added the missing `CannotReadFile` case (3.4's 
`cannotReadFilesError(Throwable, String)` matches 3.5; the `FileNotFound` case 
was already present on 3.4).
   
   **2. Version-dependent message assertions.** The two 
`SparkErrorConverterSuite` tests and the `CometExecSuite` e2e test now assert 
on the version-stable message — `"Encountered error while reading file …"` — 
instead of the `FAILED_READ_FILE` literal, which only Spark 4.x prepends to 
`getMessage`. (Went with the stable substring over a 
`getErrorClass`/`getCondition` test shim; glad to switch if you'd rather assert 
the error class.)
   
   **3. ignoreCorruptFiles / ignoreMissingFiles.** Two parts:
   - *Behavior:* no gap — `CometNativeScan` already declines the native scan 
and falls back to Spark when either config is enabled 
(`CometNativeScan.scala:75-87`), so Spark's skip semantics are preserved.
   - *Wording:* `cannot_read_file_message` now appends Spark's phrasing (`"is 
not a Parquet file"`) for parquet-rs's magic/footer error (`"Invalid Parquet 
file. Corrupt footer"`), so `ParquetQuerySuite`'s assertion matches. The outer 
`cannotReadFilesError` wrapper is unchanged, so it composes with Spark's tests 
and doesn't disturb the existing message expectations.
   
   **4. `IoError` scoping.** Dropped the unconditional `IoError` arm — scans 
surface read failures as typed `ParquetError`/`ObjectStore`, so a bare 
`IoError` (spill/shuffle) is no longer relabelled `FAILED_READ_FILE`.
   
   **5. Double `try_classify` call.** Tidied — the DataFusion arm is a single 
`if let Some(..)`, with the generic fallback extracted to 
`throw_generic_exception`.
   
   **Also** (a separate case surfaced by the same area): a missing file — an 
`object_store` `NotFound`, including when wrapped by the parquet reader as 
`ParquetError::External` — now classifies as `FileNotFound` → 
`readCurrentFileNotFoundError`, matching what Spark does for a file deleted 
between planning and execution. This fixes Delta's `DeltaVacuumSuite` "vacuum 
for cdc" suites.
   
   Added Rust classifier unit tests for the NotFound / wrapped-NotFound, 
corrupt-vs-missing, IoError, and magic-footer-wording cases. Verified locally 
on Spark 4.x and Spark 3.5 / Delta 3.3.2 (the `DeltaVacuumSuite` "vacuum for 
cdc" tests now pass).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to