hypsakata opened a new pull request, #48699:
URL: https://github.com/apache/arrow/pull/48699

   
   ### Rationale for this change
   
   When building an `Arrow::Table` from a Ruby Hash passed to 
`Arrow::Table.new`, nested Integer arrays are incorrectly inferred as 
`list<uint8>` or `list<int8>` regardless of the actual values contained. Nested 
integer arrays should be correctly inferred as the appropriate list type (e.g., 
`list<int64>`, `list<uint64>`) based on their values, similar to how flat 
arrays are handled, unless they contain values out of range for any integer 
type.
   
   ### What changes are included in this PR?
   
   This PR modifies the logic in `detect_builder_info()` to fix the inference 
issue. Specifically:
   
   - **Persist `sub_builder_info` across sub-array elements**: Previously, 
`sub_builder_info` was recreated for each sub-array element in the Array. The 
logic has been updated to accumulate and carry over the builder information 
across elements to ensure correct type inference for the entire list.
   - **Refactor Integer builder logic**: Following the pattern used for 
`BigDecimal`, the logic for determining the Integer builder has been moved to 
`create_builder()`. `detect_builder_info()` now calls this function.
   
   **Note:** 


   
   - As a side effect of this refactoring, nested lists of `BigDecimal` (which 
were previously inferred as `string`) may now have their types inferred. 
However, comprehensive testing and verification for nested `BigDecimal` support 
will be addressed in a separate issue to keep this PR focused.
   - We stopped using `IntArrayBuilder` for inference logic to ensure 
correctness. This results in a performance overhead (array building is 
approximately 2x slower) as we can no longer rely on the specialized builder's 
detection.
   
   ```text
                                              user     system      total        
real
       array_builder int32 100000         0.085867   0.000194   0.086061 (  
0.086369)
   int_array_builder int32 100000         0.042163   0.001033   0.043196 (  
0.043268)
       array_builder int64 100000         0.086799   0.000015   0.086814 (  
0.086828)
   int_array_builder int64 100000         0.044493   0.000973   0.045466 (  
0.045469)
       array_builder uint32 100000        0.085748   0.000009   0.085757 (  
0.085768)
   int_array_builder uint32 100000        0.044463   0.001034   0.045497 (  
0.045498)
       array_builder uint64 100000        0.084548   0.000987   0.085535 (  
0.085537)
   int_array_builder uint64 100000        0.044206   0.000017   0.044223 (  
0.044225)
   ```
   
   ### Are these changes tested?
   
   Yes. `ruby ruby/red-arrow/test/run-test.rb`
   
   ### Are there any user-facing changes?
   
   Yes.
   
   Github Issue: #48481 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to