sezruby commented on issue #12263:
URL: https://github.com/apache/gluten/issues/12263#issuecomment-4723779006

   Thanks @jja725 — I read through facebookincubator/velox#16556 to understand 
the boundary. A few observations that I think make these efforts compose rather 
than compete:
   
   The connector's data plane (`lance_batch_to_arrow` → 
`ArrowArray`/`ArrowSchema`) is the same Arrow C-ABI handoff regardless of path, 
so there's no conflict at the data level. The open question is **who supplies 
the scan parameters**. Right now `LanceDataSource::addSplit` opens with 
`lance_dataset_open(path, nullptr /*storage_opts*/, 0 /*version*/)` and 
`lance_scanner_new(..., nullptr /*filter*/)` — so storage options, version 
pinning, and filter pushdown aren't wired through `LanceConnectorSplit` / 
`LanceTableHandle` yet.
   
   That's exactly the state @wirybeaver's `LanceNativeScanPlan` descriptor 
(lance-format/lance-spark#624) captures — resolved version, storage options, 
pushed filter SQL, namespace context, and fragment IDs per split. So the clean 
composition looks like:
   
   ```
   lance-spark plans
     -> #624 descriptor (LanceNativeScanPlan)
       -> gluten maps to LanceConnectorSplit (+filter / version / storage_opts)
         -> Velox Lance connector
           -> Arrow C-batches
   ```
   
   For **shipping sooner**, the read path I'm taking reuses lance-spark's 
*already-executed* scanner output via an Arrow C-stream 
(lance-format/lance#7259), which inherits lance-spark's planning (incl. the kNN 
prefilter) for free without the connector needing storage-opts / filter / 
version support yet. When the connector matures, the gluten offload rule can 
switch from \"C-stream over lance-spark's scan\" to \"emit a 
LanceConnectorSplit from the #624 descriptor\" without changing the rule's 
boundary.
   
   Two questions for @jja725:
   1. Do you intend `LanceConnectorSplit` / `LanceTableHandle` to carry version 
+ storage options + pushed filter, or would those come from a query-config / a 
descriptor like #624?
   2. Is `liblance_c.a` the same C wrapper the Java SDK's JNI uses, or a 
separate surface? It affects whether the filter-string dialect matches what 
lance-spark pushes down.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to