hypsakata opened a new pull request, #48699:
URL: https://github.com/apache/arrow/pull/48699
### Rationale for this change
When building an `Arrow::Table` from a Ruby Hash passed to
`Arrow::Table.new`, nested Integer arrays are incorrectly inferred as
`list<uint8>` or `list<int8>` regardless of the actual values contained. Nested
integer arrays should be correctly inferred as the appropriate list type (e.g.,
`list<int64>`, `list<uint64>`) based on their values, similar to how flat
arrays are handled, unless they contain values out of range for any integer
type.
### What changes are included in this PR?
This PR modifies the logic in `detect_builder_info()` to fix the inference
issue. Specifically:
- **Persist `sub_builder_info` across sub-array elements**: Previously,
`sub_builder_info` was recreated for each sub-array element in the Array. The
logic has been updated to accumulate and carry over the builder information
across elements to ensure correct type inference for the entire list.
- **Refactor Integer builder logic**: Following the pattern used for
`BigDecimal`, the logic for determining the Integer builder has been moved to
`create_builder()`. `detect_builder_info()` now calls this function.
**Note:**
- As a side effect of this refactoring, nested lists of `BigDecimal` (which
were previously inferred as `string`) may now have their types inferred.
However, comprehensive testing and verification for nested `BigDecimal` support
will be addressed in a separate issue to keep this PR focused.
- We stopped using `IntArrayBuilder` for inference logic to ensure
correctness. This results in a performance overhead (array building is
approximately 2x slower) as we can no longer rely on the specialized builder's
detection.
```text
user system total
real
array_builder int32 100000 0.085867 0.000194 0.086061 (
0.086369)
int_array_builder int32 100000 0.042163 0.001033 0.043196 (
0.043268)
array_builder int64 100000 0.086799 0.000015 0.086814 (
0.086828)
int_array_builder int64 100000 0.044493 0.000973 0.045466 (
0.045469)
array_builder uint32 100000 0.085748 0.000009 0.085757 (
0.085768)
int_array_builder uint32 100000 0.044463 0.001034 0.045497 (
0.045498)
array_builder uint64 100000 0.084548 0.000987 0.085535 (
0.085537)
int_array_builder uint64 100000 0.044206 0.000017 0.044223 (
0.044225)
```
### Are these changes tested?
Yes. `ruby ruby/red-arrow/test/run-test.rb`
### Are there any user-facing changes?
Yes.
Github Issue: #48481
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]