sezruby commented on issue #12263:
URL: https://github.com/apache/gluten/issues/12263#issuecomment-4683035652

   A few additional points worth deciding explicitly before we file the 
lance-spark issue, on top of the direction @LuciferYang and @FelixYBW already 
agreed on. Mostly about scope and edges — none of this changes the C-ABI design.
   
   **1. `COUNT(*)` carve-out.** Lance has `LanceScanner.countRows()` (a JNI 
shortcut, no scan, no Arrow), and lance-spark already wires it via 
`LanceCountStarPartitionReader`. Gluten's offload rule shouldn't match that 
case — Velox can't beat a native row count. One predicate in the rule, prevents 
an obvious regression on a very common query.
   
   **2. Virtual columns (`_fragid`, `<col>__blob_position`, 
`<col>__blob_size`).** These are appended Java-side in 
`LanceFragmentColumnarBatchScanner.loadNextBatch` lines 65-73, after the Arrow 
import — they're NOT in the C-stream. Three options:
   
   - (a) Validator falls back to vanilla when projected. Simple, safe; loses 
offload on these queries.
   - (b) Synthesize them in a Velox `Project` node above the offloaded scan 
(`_fragid` is a per-partition literal int; blob position/size are deterministic 
transforms over a `StructVector`). Keeps offload, but bakes lance-domain logic 
into gluten.
   - (c) Push them into the native scanner via a future lance-core change. Out 
of scope.
   
   I'd ship (a) for v1 and revisit (b) once the integration lands.
   
   **3. Row-level ops out of scope.** `MERGE`/`UPDATE`/`DELETE` go through 
`SparkPositionDeltaWrite`; backfills go through `AddColumnsBackfillWrite` / 
`UpdateColumnsBackfillWrite`. Gluten doesn't offload row-level ops on any 
source today (iceberg falls back too). Explicit: rule matches plain 
`BatchScanExec` (read) + plain `LanceBatchWrite` (insert/CTAS) only. Everything 
else falls through unchanged.
   
   **4. Memory accounting is a non-issue with this design.** 
`LanceRuntime.allocator()`'s unbounded `RootAllocator` worries me less than I 
initially thought — gluten allocates the `ArrowArrayStream` from 
`ArrowBufferAllocators.contextInstance()` (which IS wired into Spark 
`TaskMemoryManager` via `ArrowReservationListener`), and the C-data release 
callback pulls released memory back through that allocator. Lance-spark's 
allocator never holds anything for gluten on the read path. So no need to add a 
caller-supplied `BufferAllocator` API on lance-spark — the natural ownership 
boundary already does the right thing.
   
   **5. `WriteFragmentBuilder.schema(LanceSchema)` is an optimization, not a 
requirement.** Verified on `lance v8.0.0-beta.9` (lance-spark's pinned version) 
— `Fragment.write().data(ArrowArrayStream).execute()` already passes only 
`stream.memoryAddress()` across JNI; the schema travels with the stream's 
`get_schema` callback. So `schema(...)` is useful for skipping a 
`dataset.open()` round-trip per task, but the gluten writer is correct without 
it. Adopt as a perf follow-up.
   
   **6. Phasing.** Worth being explicit:
   - **Phase 1 (JVM-only):** All three lance-spark API changes + `gluten-lance` 
module with `OffloadLanceScan`, `LanceScanTransformer`, 
`LanceWriteTransformer`. No cpp changes. Reuses 
`ArrowAbiUtil.importToSparkColumnarBatch` + the existing 
`ArrowColumnarToVeloxColumnarExec` transition. Ships the integration.
   - **Phase 2 (cpp-side):** Direct C-stream import into Velox via 
`Bridge.cpp`, skipping the Arrow→Velox transition. Pure perf, no behavior 
change, ships independently after Phase 1 lands.
   - **Phase 3 (when upstream lands):** If velox#16556 lands cleanly, expose it 
as an alternative path. lance-spark stays the default. If lance-core adds 
native virtual-column support, switch (a)→(c) above.
   
   **7. What survives lance-spark planning.** For the record, since it's worth 
being clear: the DSv2 `Scan` builder runs before gluten rules, so filter 
pushdown, column pruning, kNN prefilter, TopN, zonemap pruning, fragment 
skipping, and namespace integration all stay intact. Gluten only intercepts the 
per-batch handoff, not the planning. Aggregations / joins / windows offload to 
Velox above the scan as usual.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to