schenksj commented on PR #4525:
URL: 
https://github.com/apache/datafusion-comet/pull/4525#issuecomment-4587660475

   Thanks @andygrove — and for building the branch and probing the gate 
directly; that `hdfs` divergence is a real silent-fallback regression and worth 
closing before merge.
   
   **`hdfs://` default.** Adopted your suggestion: when 
`spark.hadoop.fs.comet.libhdfs.schemes` is unset, `libhdfsSchemes` now defaults 
to `Set("hdfs")`, mirroring the native `is_hdfs_scheme` default (`scheme == 
"hdfs"` when the config is unset) and the default `["hdfs-opendal"]` build. So 
a plain `hdfs://` V1 scan stays claimed by Comet instead of silently falling 
back. `s3a`/`file` are unaffected (object_store recognizes them via 
`parse_url`), and an explicit config still takes over verbatim.
   
   **Test.** Added `native scan claims hdfs:// when libhdfs.schemes is unset` 
to `CometScanSchemeFallbackSuite`, alongside the existing `fake://` decline 
case. It backs the `hdfs` scheme with a local FS (`FakeHdfsSchemeFileSystem`, 
RawLocalFileSystem reporting `getScheme = "hdfs"`) so an `hdfs://` path is 
exercised without a live cluster, then applies `CometScanRule` to the plan and 
asserts the scan is **claimed** (a `CometScanExec`, no leftover 
`FileSourceScanExec`). It's a real guard: it fails with the old `case None => 
Set.empty` (hdfs declined) and passes with the `Set("hdfs")` default.
   
   **V2 `BatchScanExec`.** Intentionally V1-only here — the gate lives in the 
`FileSourceScanExec` path (`nativeScan`). The V2 native paths Comet currently 
claims are CSV-V2 and Iceberg; Iceberg resolves IO through its own FileIO 
rather than the V1 `rootPaths → parse_url` route, so it doesn't hit the same 
`Unable to recognise URL`. If/when a native Parquet-V2 scan lands it should get 
a parallel scheme gate — happy to file a follow-up issue to track that.
   
   Also fixed a latent compile break the gate carried: the decline branch still 
called `withInfo`, which #4508 renamed to `withFallbackReason` — updated to 
match.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to