[
https://issues.apache.org/jira/browse/ARROW-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joris Van den Bossche updated ARROW-5682:
-----------------------------------------
Issue Type: Bug (was: Improvement)
> [Python] from_pandas conversion casts values to string inconsistently
> ---------------------------------------------------------------------
>
> Key: ARROW-5682
> URL: https://issues.apache.org/jira/browse/ARROW-5682
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.13.0
> Reporter: Bryan Cutler
> Priority: Minor
>
> When calling {{pa.Array.from_pandas}} primitive data as input, and casting to
> string with "type=pa.string()", the resulting pyarrow Array can have
> inconsistent values. For most input, the result is an empty string, however
> for some types (int32, int64) the values are '\x01' etc.
> {noformat}
> In [8]: s = pd.Series([1, 2, 3], dtype=np.uint8)
> In [9]: pa.Array.from_pandas(s, type=pa.string())
>
> Out[9]:
> <pyarrow.lib.StringArray object at 0x7f90b6091a48>
> [
> "",
> "",
> ""
> ]
> In [10]: s = pd.Series([1, 2, 3], dtype=np.uint32)
>
> In [11]: pa.Array.from_pandas(s, type=pa.string())
>
> Out[11]:
> <pyarrow.lib.StringArray object at 0x7f9097efca48>
> [
> "",
> "",
> ""
> ]
> {noformat}
> This came from the Spark discussion
> https://github.com/apache/spark/pull/24930/files#r296187903. Type casting
> this way in Spark is not supported, but it would be good to get the behavior
> consistent. Would it be better to raise an UnsupportedOperation error?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)