schenksj commented on PR #4525:
URL:
https://github.com/apache/datafusion-comet/pull/4525#issuecomment-4587660475
Thanks @andygrove — and for building the branch and probing the gate
directly; that `hdfs` divergence is a real silent-fallback regression and worth
closing before merge.
**`hdfs://` default.** Adopted your suggestion: when
`spark.hadoop.fs.comet.libhdfs.schemes` is unset, `libhdfsSchemes` now defaults
to `Set("hdfs")`, mirroring the native `is_hdfs_scheme` default (`scheme ==
"hdfs"` when the config is unset) and the default `["hdfs-opendal"]` build. So
a plain `hdfs://` V1 scan stays claimed by Comet instead of silently falling
back. `s3a`/`file` are unaffected (object_store recognizes them via
`parse_url`), and an explicit config still takes over verbatim.
**Test.** Added `native scan claims hdfs:// when libhdfs.schemes is unset`
to `CometScanSchemeFallbackSuite`, alongside the existing `fake://` decline
case. It backs the `hdfs` scheme with a local FS (`FakeHdfsSchemeFileSystem`,
RawLocalFileSystem reporting `getScheme = "hdfs"`) so an `hdfs://` path is
exercised without a live cluster, then applies `CometScanRule` to the plan and
asserts the scan is **claimed** (a `CometScanExec`, no leftover
`FileSourceScanExec`). It's a real guard: it fails with the old `case None =>
Set.empty` (hdfs declined) and passes with the `Set("hdfs")` default.
**V2 `BatchScanExec`.** Intentionally V1-only here — the gate lives in the
`FileSourceScanExec` path (`nativeScan`). The V2 native paths Comet currently
claims are CSV-V2 and Iceberg; Iceberg resolves IO through its own FileIO
rather than the V1 `rootPaths → parse_url` route, so it doesn't hit the same
`Unable to recognise URL`. If/when a native Parquet-V2 scan lands it should get
a parallel scheme gate — happy to file a follow-up issue to track that.
Also fixed a latent compile break the gate carried: the decline branch still
called `withInfo`, which #4508 renamed to `withFallbackReason` — updated to
match.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]