[PR] GH-43112: [Python] Set nullable `Int64` `dtype` for integer columns with `None` values when converting to pandas [arrow]

via GitHub Sat, 26 Oct 2024 06:41:33 -0700


attwelveDev opened a new pull request, #44538:
URL: https://github.com/apache/arrow/pull/44538


   ### Rationale for this change
   When calling `to_pandas` method on a `Table` object, integer columns with at 
least one `None` value are converted to `float64` in the resultant pandas 
`DataFrame`. For example, using the `from_pydict` method to return a `Table` 
from a Python `Dictionary`, say, `{“col_name”: [1, None]}`, the `to_pandas` 
method returns a `DataFrame` where the `dtype` of `“col_name”` is `float64`, 
and whose values are `[1.0, NaN]`. This may cause precision issues when certain 
integers cannot be precisely converted to a float.
   
   ### What changes are included in this PR?
   In the `table_to_dataframe` method, columns with the `int64` `dtype` and 
have at least one `None` value now have the `Int64` `dtype` in 
`ext_columns_dtypes`.
   
   Various existing tests were modified to reflect this new behaviour. 
   
   ### Are these changes tested?
   Tests have been added as part of `test_pandas_dtype_conversions` in 
`/python/pyarrow/tests/test_pandas.py`.
   
   ### Are there any user-facing changes?
   This may affect instances where integer columns with `None` values are 
expected to be converted to `float64`. 
   
   ### Additional Notes
   There are failing CI tests for other languages, possibly due to unrelated 
issues from the upstream main branch. All CI tests for Python have been 
verified to pass. 
   
   This is my first contribution to an open source project, and I appreciate 
any feedback. 
   
   - GitHub issue: #43112 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] GH-43112: [Python] Set nullable `Int64` `dtype` for integer columns with `None` values when converting to pandas [arrow]

Reply via email to