Jeadie opened a new pull request, #10236:
URL: https://github.com/apache/arrow-rs/pull/10236

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax.
   -->
   
   - No issue yet. Happy to open if changes wanted. 
   
   # Rationale for this change
   
   <!--
   Why are you proposing this change? If this is already explained clearly in 
the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your 
changes and offer better suggestions for fixes.
   -->
   This PR improves the performance of `FilterPredicate::filter` for array 
based data types, specifically: `List<T>`, `FixedSizeList<T>`, `Map<T>`. 
   
   This optimisation is based on one idea: translate retained (i.e. filter = 
true) parent-row runs into child element ranges (trivially contiguous due to 
how list/fixed/map layouts work), then hand those ranges to a already-fast 
child kernels rather than copying element-by-element.
   
   `filter` is one of the most-executed kernels in Apache DataFusion, and now 
these list/nested types have  fast path. Several common Datafusion uses are 
especially impacted:
    - `FixedSizeList` as embeddings or vectors
    - JSON or nested data (`array_agg`, `Unnest`)
    - `GROUP BY` operations
    - Hash/sort-merge joins filter probe/build columns on list types
   
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is 
sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   1. Specialisation within `FilterPredicate::filter` for 
`DataType::FixedSizeList`, `DataType::Map`, `DataType::List` and 
`DataType::LargeList` (the latter two are only specialised for certain/most 
child types).
   2. Associated benchmarks in `arrow/benches/filter_kernels.rs`. 
   
   
   ## Changes Explained
   ### Before
   Prior to this PR, `List<T>`, used the `MutableArrayData` fallback.
   
   Example
   ```
   filter:       [ T    F    T    T    F  ]
   parent rows:  [row0|row1|row2|row3|row4]
   
   child values: [a b c|d e|f g h|i j k|l m n o]
   ```
   MutableArrayData walks the full child buffer, copying by range for each 
retained row.
   
   ### After
   ```
   filter:   [ T    F    T    T    F  ]
               row0 row1 row2 row3 row4
   
   offsets:  [ 0    3    5    8   11   15 ]
   
   predicate_row_ranges → [(0,1), (2,4)]   ← runs of kept rows
   
   child_ranges:
     run (0,1): offsets[0]..offsets[1] = [0,  3) ← 1 parent element
     run (2,4): offsets[2]..offsets[4] = [5, 11)  ← 2 parent elements. rows 2+3 
merged into ONE range
   
   child values: [a b c|d e|f g h|i j k|l m n o]
                   ╰─────╯   ╰───────────╯
                   [0, 3)       [5, 11)
   
   Rebuild new offsets from retained row lengths:
     row0: 3-0=3  → new_offsets: [0, 3]
     row2: 8-5=3  → new_offsets: [0, 3, 6]
     row3: 11-8=3 → new_offsets: [0, 3, 6, 9]
   
   output: List([ [a,b,c], [f,g,h], [i,j,k] ])
   ```
   
   ### Non-specialised Child types.
   
   `List<T>` is not specialised for some `T` child types (and similarly other 
array types mentioned). This PR specialises if the child type `T` has a fast, 
vectorized kernel for it that is driven only by the predicate's `Slices` (never 
reads `filter` directly). Everything else uses the well-tuned, correct 
`MutableArrayData` fallback.
   
   | Child type | Why it stays on fallback |
   |---|---|
   | dense `Union` | Consecutive rows carry different type-ids and 
non-contiguous child offsets, no contiguous ranges to copy |
   | `RunEndEncoded` | Its kernel (`filter_run_end_array`) reads 
`predicate.filter` **directly**. The specialization streams ranges via a 
`Slices` predicate whose `filter` is intentionally empty. |
   | **unmeasured exotics** | Several exotics not benchmarked stay on fallback 
by default. |
   
   Every other list child *is* specialized: primitives, boolean, null, 
`Utf8`/`LargeUtf8`/`Binary`/`LargeBinary`, `Utf8View`/`BinaryView`, 
`FixedSizeBinary`, `FixedSizeList`, `Dictionary`, `Struct`, sparse `Union`, 
`ListView`/`LargeListView`, and nested `List`/`LargeList`.
   
   # Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are 
they covered by existing tests)?
   
   If this PR claims a performance improvement, please include evidence such as 
benchmark results.
   -->
   Tests in `arrow-select/src/filter.rs`. 
   
   ## Benchmark results
   ```shell
   >> cargo bench -p arrow --bench filter_kernels \
     --features test_utils \
     --baseline after \
     -- "filter (list|fixedsizelist|map)"
   ```
   
   - `size = 65536`
   - Cells: `before → after (speedup)`, where 
     - before = `MutableArrayData` fallback
     - after = specialized. 
     - ⚠ marks a regression (sub-1.0).
   
   ### `List<T>` by child type
   
   | Child | kept ½ | kept 1023/1024 | kept 1/1024 |
   |---|---|---|---|
   | Int32 | 433→373 µs (1.16×) | 128→93 µs (1.38×) | 2.92→1.12 µs (2.60×) |
   | Utf8 | 627→588 µs (1.07×) | 455→471 µs (0.97× ⚠) | 3.57→1.44 µs (2.48×) |
   | LargeUtf8 | 679→593 µs (1.14×) | 644→474 µs (1.36×) | 3.58→1.44 µs (2.49×) 
|
   | Binary | 623→581 µs (1.07×) | 451→469 µs (0.96× ⚠) | 3.49→1.44 µs (2.43×) |
   | LargeBinary | 656→590 µs (1.11×) | 628→470 µs (1.34×) | 3.58→1.44 µs 
(2.48×) |
   | Utf8View | 598→315 µs (1.90×) | 566→183 µs (3.10×) | 3.16→0.94 µs (3.35×) |
   | FixedSizeBinary | 464→348 µs (1.33×) | 213→118 µs (1.80×) | 2.87→1.03 µs 
(2.79×) |
   | FixedSizeList | 540→446 µs (1.21×) | 216→134 µs (1.61×) | 3.61→1.42 µs 
(2.54×) |
   | Dictionary | 482→370 µs (1.30×) | 126→94 µs (1.34×) | 3.36→1.37 µs (2.45×) 
|
   | Struct | 519→371 µs (1.40×) | 128→93 µs (1.37×) | 3.69→1.12 µs (3.28×) |
   | Map | 1107→998 µs (1.11×) | 780→648 µs (1.20×) | 6.32→2.89 µs (2.19×) |
   | Union (sparse) | 832→663 µs (1.26×) | 324→168 µs (1.93×) | 5.41→2.30 µs 
(2.35×) |
   | ListView | 1702→548 µs (3.11×) | 2687→128 µs (20.9×) | 5.85→1.41 µs 
(4.13×) |
   | nested `List<List>` | 612→620 µs (0.99× ⚠) | 320→295 µs (1.09×) | 
3.93→1.73 µs (2.27×) |
   
   ### Direct kernels (new)
   
   | Kernel | kept ½ | kept 1023/1024 | kept 1/1024 |
   |---|---|---|---|
   | `filter FixedSizeList` | 309→197 µs (1.57×) | 20.8→21.6 µs (0.96× ⚠) | 
3.27→0.69 µs (4.76×) |
   | `filter Map` | 791→648 µs (1.22×) | 286→291 µs (0.98×) | 5.01→1.88 µs 
(2.66×) |
   
   ### Value-length sweep — `List<Utf8>` @ kept ½
   
   | Value length | before → after | speedup |
   |---|---|---|
   | 8 B | 669→591 µs | 1.13× |
   | 64 B | 1241→1074 µs | 1.16× |
   | 256 B | 4796→3167 µs | 1.51× |
   | 10× rows (short) | 6598→6446 µs | 1.02× |
   
   ### Regressions / caveats
   
   All sub-1.0 results occur only at the dense `kept 1023/1024` end (rare for 
selective predicates), plus the nested-list `½` tie:
   - **dense:** Utf8 0.97×, Binary 0.96×, `filter FixedSizeList` 0.96× 
(memcpy-bound — the fallback is already tight there).
   - **nested `List<List>` @ ½: 0.99×** (offset-dominated; ties the fallback, 
then wins 1.09×/2.27× at the other selectivities).
   
   No remaining regression exceeds ~4%. Every `kept 1/1024` (highly selective) 
case is a 2.2–4.8× win.
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   
   If there are any breaking changes to public APIs, please call them out.
   -->
   
   N/A.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to