adriangb opened a new pull request, #9607:
URL: https://github.com/apache/arrow-rs/pull/9607

   ## Summary
   
   - Fix `MutableArrayData::extend_nulls` which previously panicked 
unconditionally for both sparse and dense Union arrays
   - For sparse unions: append the first type_id and extend nulls in all 
children
   - For dense unions: append the first type_id, compute offsets into the first 
child, and extend nulls in that child only
   
   ## Background
   
   This bug was discovered via DataFusion. `CaseExpr` uses `MutableArrayData` 
via `scatter()` to build result arrays. When a `CASE` expression returns a 
Union type (e.g., from `json_get` which returns a JSON union) and there are 
rows where no `WHEN` branch matches (implicit `ELSE NULL`), `scatter` calls 
`extend_nulls` which panics with "cannot call extend_nulls on UnionArray as 
cannot infer type".
   
   Any query like:
   ```sql
   SELECT CASE WHEN condition THEN json_get(col, 'key') END FROM table
   ```
   would panic if `condition` is false for any row.
   
   ## Root Cause
   
   The `extend_nulls` implementation for Union arrays unconditionally panicked 
because it claimed it "cannot infer type". However, the Union's field 
definitions (child types and type IDs) are available in the 
`MutableArrayData`'s data type — there's enough information to produce valid 
null entries by picking the first declared type_id.
   
   ## Test plan
   
   - [x] Added test for sparse union `extend_nulls`
   - [x] Added test for dense union `extend_nulls`
   - [x] Existing `test_union_dense` continues to pass
   - [x] All `array_transform` tests pass
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to