wirybeaver opened a new pull request, #4633: URL: https://github.com/apache/datafusion-comet/pull/4633
## Which issue does this PR close? Closes #4632. ## Rationale for this change Comet already has a native table-scan path for Iceberg. Lance tables are currently planned and read through Lance Spark. This prototype keeps Lance Spark as the Spark planning contract, then lets an optional Comet contrib reader detect Lance V2 scans, extract a stable descriptor from Lance Spark, and execute the assigned Lance fragments through native Rust Lance APIs. The Lance Spark side of the descriptor contract is proposed in lance-format/lance-spark#624. ## What changes are included in this PR? - Adds an opt-in `contrib-lance` Maven profile and Rust `contrib-lance` feature. - Adds a small reflection-only Lance bridge in Comet core so default builds do not depend on Lance Spark. - Adds `spark.comet.scan.lanceNative.enabled`, disabled by default. - Extends scan planning to detect Lance `BatchScanExec` plans and delegate to contrib-lance when present and enabled. - Adds typed native proto support with `lance_scan = 118` and split-mode payloads. - Adds Scala contrib serialization/execution classes for Lance native scans. - Adds Rust native `LanceScanExec` using the Rust Lance API for dataset open, fragment selection, projection, filter SQL, limit/offset, batch size, and record batch streaming. This is intentionally a draft prototype. Minimal v1 scope is ordinary Lance table reads only. Index/search reads, namespace-backed credential refresh, metadata/version columns, aggregation pushdown, and production CI coverage are future phases. Known blocker before this can be merge-ready: packaged Comet currently contains `org.apache.arrow.c` classes rewritten against Comet's shaded Arrow allocator, while Lance Spark expects the normal Arrow C Data ABI. A packaged Spark smoke with both jars exposes this classpath conflict. We need an explicit Arrow C Data packaging/classloader strategy for Comet + Lance Spark before merging a production-ready native Lance reader. ## How are these changes tested? Passed: - `~/.cargo/bin/cargo check -p datafusion-comet --no-default-features` - `~/.cargo/bin/cargo check -p datafusion-comet --no-default-features --features contrib-lance` - `./mvnw test -Dtest=none -Dsuites="org.apache.comet.rules.CometScanRuleSuite" -Pspark-4.1,contrib-lance -Dscalastyle.skip=true` - `./mvnw package -DskipTests -Pspark-4.1,contrib-lance -Dscalastyle.skip=true` Smoke attempted: - `source ~/uvenv/common/bin/activate && python /home/user/draft/comet_lance_native_smoke.py` The smoke writes and reads a local Lance dataset, but packaged Comet + Lance Spark currently fails at runtime with an Arrow C Data ABI/classpath conflict as described above. The draft PR keeps that blocker visible for design review instead of hiding it behind unit-only coverage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
