[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377068#comment-16377068
 ] 

ASF GitHub Bot commented on ARROW-2205:
---------------------------------------

cpcloud commented on a change in pull request #1650: ARROW-2205: [Python] 
Option for integer object nulls
URL: https://github.com/apache/arrow/pull/1650#discussion_r170637799
 
 

 ##########
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##########
 @@ -615,6 +615,36 @@ def test_int_object_nulls(self):
         _check_pandas_roundtrip(df, expected=expected,
                                 expected_schema=schema)
 
+    def test_int_object_nulls_option(self):
 
 Review comment:
   It doesn't look like you're using `self` here. Can you make this into a test 
function and 
[`pytest.mark.parametrize`](https://docs.pytest.org/en/latest/parametrize.html) 
it on the `int_dtypes` parameter?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> ----------------------------------------
>
>                 Key: ARROW-2205
>                 URL: https://issues.apache.org/jira/browse/ARROW-2205
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>    Affects Versions: 0.8.0
>            Reporter: Albert Shieh
>            Priority: Major
>              Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>       a
> 0  None
> 1     1
>      a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to