zhuqi-lucas opened a new pull request, #23128:
URL: https://github.com/apache/datafusion/pull/23128

   PR 2 of the EPIC #22715 sequence. Builds on the dispatcher refactor + 
factory framework that landed in PR 1 (#22751) to bring the first nested type 
into \`GroupValuesColumn\`, unblocking the rest of the sequence (PR 3 Struct, 
PR 4 List, PR 5 LargeList, PR 6 composite FSL, PR 7 Map).
   
   ## Which issue does this PR close?
   
   Part of #22682 (nested type coverage) and #22715 (EPIC).
   
   ## Rationale for this change
   
   Today a single \`FixedSizeList<primitive>\` column in a GROUP BY drags the 
whole grouping onto the byte-encoded \`GroupValuesRows\` fallback, even when 
every other column would have qualified for the column-wise + cross-column 
short-circuit fast path in \`GroupValuesColumn\`. With the recursive 
\`make_group_column\` factory + \`group_column_supported_type\` allow-list in 
place from PR 1, this PR is now a self-contained addition of one builder and 
two dispatcher entries.
   
   ## What changes are included in this PR?
   
   - \`FixedSizeListGroupValueBuilder<T: ArrowPrimitiveType>\` in a new 
\`fixed_size_list\` submodule. Storage: outer null bitmap + a child 
\`PrimitiveGroupValueBuilder<T, true>\` that holds every element flat (length = 
\`outer_len * list_len\`). Element \`j\` of outer row \`i\` lives at child 
index \`i * list_len + j\`.
   - \`group_column_supported_type\` accepts \`FixedSizeList<primitive>\` for 
the primitive subset wired through the dispatcher (Int8..Int64, UInt8..UInt64, 
Float32, Float64, Date32, Date64). Composite children (Struct / List inside 
FSL) are deferred to a follow-up after their respective builders land.
   - \`make_group_column\` dispatches \`DataType::FixedSizeList(...)\` to the 
appropriate \`FixedSizeListGroupValueBuilder<T>\` via an \`instantiate_fsl!\` 
macro mirroring the existing \`instantiate_primitive!\` pattern.
   
   ## Are these changes tested?
   
   Yes. 12 new unit tests on \`FixedSizeListGroupValueBuilder\` plus extensions 
to the existing \`group_column_supported_type\` ⇔ \`make_group_column\` 
consistency fuzz.
   
   Per-builder:
   - append / build round trip with mixed outer-null and inner-null rows
   - \`equal_to\` for identical / different / outer-null / inner-null rows
   - \`take_n\` boundary cases (\`n=0\`, \`n=len\`, prefix containing null rows)
   - sliced input array (offset != 0)
   - \`vectorized_equal_to\` / \`vectorized_append\` match per-row reference
   - \`size()\` grows with appends
   - \`build\` on empty builder returns empty array
   - end-to-end dispatcher → \`GroupValuesColumn\` routing
   
   Consistency fuzz now covers four primitive FSL children (signed, unsigned, 
float, date) on the supported side and three non-primitive FSL children (Utf8, 
Decimal128, Boolean) on the rejected side, locking in the scope boundary for 
this PR.
   
   127/127 aggregates tests pass, clippy clean, fmt clean.
   
   ## What follows in the EPIC
   
   - PR 3: \`Struct<...>\` builder + dispatcher.
   - PR 4: \`List<T>\` builder + dispatcher (recursive child via factory).
   - PR 5: \`LargeList<T>\` builder + dispatcher.
   - PR 6: Relax FSL child restriction to allow \`FSL<Struct>\` / \`FSL<List>\` 
once the prerequisite child builders are in.
   - PR 7: \`Map<K,V>\` (\`List<Struct<keys, values>>\` Arrow representation).
   
   ## Are there any user-facing changes?
   
   No behavior change for users whose GROUP BY did not previously contain a 
\`FixedSizeList<primitive>\` column. For users who did, the grouping now uses 
the column-native fast path instead of falling back to \`GroupValuesRows\` — 
same results, less memory and CPU.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to