schenksj opened a new pull request, #4525: URL: https://github.com/apache/datafusion-comet/pull/4525
## Which issue does this PR close? Closes #4520. ## Rationale for this change Comet's native readers go through `object_store`, which only understands a fixed set of URL schemes. When a scan's path uses a custom Hadoop `FileSystem` scheme (e.g. registered via `spark.hadoop.fs.<scheme>.impl`), the native reader fails at *execution* with `Generic URL error: Unable to recognise URL "..."` — there is no graceful recovery once native execution has started. This was surfaced by Delta tables opened with custom filesystem options (`DeltaTable.forPath(spark, path, fsOptions)`), where Delta reads its internal `_delta_log/*.checkpoint.parquet` via ordinary V1 parquet scans that Comet then claimed and crashed on, but it reproduces for *any* V1 parquet scan on such a scheme. ## What changes are included in this PR? `CometScanRule` declines a V1 native scan when its root-path scheme isn't natively readable, so Spark's Hadoop-FS-aware reader handles it. Rather than hardcode the object_store-supported scheme set in the planner (a mirror that drifts), the answer comes from the native layer itself: a new `NativeBase.isObjectStoreSchemeSupported` JNI method backed by `object_store`'s own `ObjectStoreScheme::parse` — the same path `prepare_object_store_with_configs` dispatches through. The user's libhdfs scheme config (`spark.hadoop.fs.comet.libhdfs.schemes`) is unioned in on the JVM side; results are cached per scheme; and if native can't be consulted the scheme is assumed supported rather than over-restricting. ## How are these changes tested? `CometScanSchemeFallbackSuite` registers `FakeHDFSFileSystem` for a `fake://` scheme (not routed through libhdfs) and applies `CometScanRule` to the scan's physical plan. It asserts the scan falls back to Spark (no `CometScanExec`). The test **fails without the gate** (Comet claims the `fake://` scan) and **passes with it**. The libhdfs-scheme regression guard (`ParquetReadFromFakeHadoopFsSuite`) continues to engage Comet for configured libhdfs schemes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
