andygrove opened a new pull request, #4416:
URL: https://github.com/apache/datafusion-comet/pull/4416

   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   `CometSelectionVector` and `CometDelegateVector` became unreachable when the 
`native_iceberg_compat` JVM Parquet reader was removed (commits 7203fb596 and 
0fd52b72d). The Iceberg row-deletion path that produced selection vectors is 
now handled natively via `iceberg-rust` in `CometIcebergNativeScanExec`, and 
`CometDelegateVector` had zero callers or subclasses.
   
   With no live creator for these classes, several other surfaces became dead:
   
   - The `hasSelectionVectors` / `exportSelectionIndices` JNI methods on 
`CometBatchIterator` and their callers in `ScanExec::get_next` always took the 
no-op path.
   - The `CometSelectionVector` case in `NativeUtil.exportBatch` and the orphan 
`exportSingleVector` helper were unreachable.
   - The `setNumNulls` / `setNumValues` mutation API on `CometVector` / 
`CometDecodedVector` / `CometPlainVector` had its only callers inside the two 
deleted classes.
   - The `isReused` field on `CometPlainVector` is never set to `true` (no 
constructor passes it, no caller of `setReused`), so the `if 
(!cometPlainColumnVector.isReused())` branch in the `CometColumnarToRowExec` 
SPARK-50235 cleanup loop always executed.
   
   Removing these tightens the API surface and makes it clearer that 
`CometVector` instances are immutable after construction in the live native 
execution path.
   
   ## What changes are included in this PR?
   
   - Delete `CometSelectionVector.java` and `CometDelegateVector.java`.
   - Remove `hasSelectionVectors` / `exportSelectionIndices` from 
`CometBatchIterator`, the matching JNI bindings in 
`native/jni-bridge/src/batch_iterator.rs`, and the `get_selection_indices` 
branch in `native/core/src/execution/operators/scan.rs`.
   - Remove the `CometSelectionVector` case and the now-orphan 
`exportSingleVector` from `NativeUtil.scala`.
   - Remove `setNumNulls` / `setNumValues` from `CometVector`, 
`CometDecodedVector`, and `CometPlainVector`. Mark the now-final `numNulls` / 
`numValues` / `hasNull` fields on `CometDecodedVector` `final`.
   - Remove `isReused` / `setReused` (and the 4-arg constructor) from 
`CometPlainVector`, and simplify the SPARK-50235 cleanup loop in 
`CometColumnarToRowExec` to always close per-batch vectors that are neither 
`WritableColumnVector` nor `ConstantColumnVector`.
   - Add class-level Javadoc to the remaining vector classes (`CometVector`, 
`CometDecodedVector`, `CometPlainVector`, `CometDictionaryVector`, 
`CometDictionary`, `CometListVector`, `CometMapVector`, `CometStructVector`) 
explaining why a custom hierarchy is needed (shaded Arrow vs. Spark's 
`ArrowColumnVector`, FFI lifecycle, dictionary lazy decoding).
   
   ## How are these changes tested?
   
   Existing tests cover the affected paths:
   
   - `CometExecSuite` "TopK operator should return correct results on 
dictionary column with nulls" exercises `CometDictionaryVector` end-to-end.
   - `CometExecSuite` "CometBroadcastExchange could be converted to rows using 
CometColumnarToRow" exercises the simplified `CometColumnarToRowExec` cleanup 
loop.
   - `CometExecSuite` "project + filter on arrays" exercises `CometListVector` 
through native execution.
   - `TestColumnReader` constructs `CometPlainVector` directly via the 
surviving 2-arg constructor.
   
   All four pass locally. Native `cargo build` and `cargo clippy --all-targets 
--workspace -- -D warnings` are clean.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to