wirybeaver opened a new issue, #4632:
URL: https://github.com/apache/datafusion-comet/issues/4632

   ### What is the problem the feature request solves?
   
   Comet has native scan support for formats such as Iceberg, but Lance tables 
planned by Spark through Lance Spark currently execute through Spark's Lance 
reader. A native Lance path would let Comet read ordinary Lance table scans 
directly in Rust while preserving Spark/Lance Spark as the planning contract.
   
   This should be optional and dependency-free for default Comet builds.
   
   ### Describe the potential solution
   
   Add Lance as an experimental, opt-in Comet contrib reader:
   
   - Keep Spark planning Lance tables through Lance Spark.
   - Detect Lance V2 `BatchScanExec` plans by reflection in Comet core.
   - Extract a stable native-read descriptor exposed by Lance Spark.
   - Add a build-gated `contrib-lance` profile / Cargo feature so default 
builds do not depend on Lance.
   - Add a typed `lance_scan = 118` native proto payload with common scan 
invariants and per-partition split descriptors.
   - Execute the assigned Lance fragments through the native Rust Lance API.
   - Gate runtime activation behind 
`spark.comet.scan.lanceNative.enabled=false` by default.
   
   Minimal v1 should target ordinary Lance table reads only: local/direct 
object-store storage options, projection, filter SQL, limit/offset, batch size, 
fragment splits, and Comet-supported Spark types. Search, hybrid search, 
index-backed planning, namespace-backed credential refresh, metadata/version 
columns, and aggregation pushdown should be added in later phases after 
separate semantic review.
   
   ### Additional context
   
   The phased roadmap is:
   
   1. Lance Spark native descriptor for ordinary reads.
   2. Comet contrib scaffold and reflection bridge.
   3. Minimal native Rust Lance scan v1.
   4. V1 hardening and CI parity tests.
   5. Advanced table-read parity and metrics.
   6. Lance index/search read support.
   7. Remote namespace and credential refresh support.
   
   Known prototype blocker to resolve before this can be considered 
merge-ready: packaged Comet currently contains `org.apache.arrow.c` classes 
rewritten against Comet's shaded Arrow allocator, while Lance Spark expects the 
normal Arrow C Data ABI. An end-to-end packaged smoke test with both jars 
exposes this classpath conflict. The final design needs an explicit Arrow C 
Data packaging/classloader strategy and CI coverage for Comet + Lance Spark 
together.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to