andygrove opened a new pull request, #4416: URL: https://github.com/apache/datafusion-comet/pull/4416
## Which issue does this PR close? Closes #. ## Rationale for this change `CometSelectionVector` and `CometDelegateVector` became unreachable when the `native_iceberg_compat` JVM Parquet reader was removed (commits 7203fb596 and 0fd52b72d). The Iceberg row-deletion path that produced selection vectors is now handled natively via `iceberg-rust` in `CometIcebergNativeScanExec`, and `CometDelegateVector` had zero callers or subclasses. With no live creator for these classes, several other surfaces became dead: - The `hasSelectionVectors` / `exportSelectionIndices` JNI methods on `CometBatchIterator` and their callers in `ScanExec::get_next` always took the no-op path. - The `CometSelectionVector` case in `NativeUtil.exportBatch` and the orphan `exportSingleVector` helper were unreachable. - The `setNumNulls` / `setNumValues` mutation API on `CometVector` / `CometDecodedVector` / `CometPlainVector` had its only callers inside the two deleted classes. - The `isReused` field on `CometPlainVector` is never set to `true` (no constructor passes it, no caller of `setReused`), so the `if (!cometPlainColumnVector.isReused())` branch in the `CometColumnarToRowExec` SPARK-50235 cleanup loop always executed. Removing these tightens the API surface and makes it clearer that `CometVector` instances are immutable after construction in the live native execution path. ## What changes are included in this PR? - Delete `CometSelectionVector.java` and `CometDelegateVector.java`. - Remove `hasSelectionVectors` / `exportSelectionIndices` from `CometBatchIterator`, the matching JNI bindings in `native/jni-bridge/src/batch_iterator.rs`, and the `get_selection_indices` branch in `native/core/src/execution/operators/scan.rs`. - Remove the `CometSelectionVector` case and the now-orphan `exportSingleVector` from `NativeUtil.scala`. - Remove `setNumNulls` / `setNumValues` from `CometVector`, `CometDecodedVector`, and `CometPlainVector`. Mark the now-final `numNulls` / `numValues` / `hasNull` fields on `CometDecodedVector` `final`. - Remove `isReused` / `setReused` (and the 4-arg constructor) from `CometPlainVector`, and simplify the SPARK-50235 cleanup loop in `CometColumnarToRowExec` to always close per-batch vectors that are neither `WritableColumnVector` nor `ConstantColumnVector`. - Add class-level Javadoc to the remaining vector classes (`CometVector`, `CometDecodedVector`, `CometPlainVector`, `CometDictionaryVector`, `CometDictionary`, `CometListVector`, `CometMapVector`, `CometStructVector`) explaining why a custom hierarchy is needed (shaded Arrow vs. Spark's `ArrowColumnVector`, FFI lifecycle, dictionary lazy decoding). ## How are these changes tested? Existing tests cover the affected paths: - `CometExecSuite` "TopK operator should return correct results on dictionary column with nulls" exercises `CometDictionaryVector` end-to-end. - `CometExecSuite` "CometBroadcastExchange could be converted to rows using CometColumnarToRow" exercises the simplified `CometColumnarToRowExec` cleanup loop. - `CometExecSuite` "project + filter on arrays" exercises `CometListVector` through native execution. - `TestColumnReader` constructs `CometPlainVector` directly via the surviving 2-arg constructor. All four pass locally. Native `cargo build` and `cargo clippy --all-targets --workspace -- -D warnings` are clean. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
