[
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436517#comment-16436517
]
ASF GitHub Bot commented on ARROW-2101:
---------------------------------------
BryanCutler commented on issue #1886: Bug fix for ARROW-2101
URL: https://github.com/apache/arrow/pull/1886#issuecomment-380981795
Just so I have this straight, the old behavior was when the user specifies
an explicit type as `pa.string()` and a binary object was found, it would
fallback to `BinaryArray` and continue. This changes it to try to convert the
object to utf-8 and raises an error if it fails, only if the type is specified?
Does anyone know if there was a reason to fallback in this case? I think
this change makes sense, but just want to make sure we are not breaking
anything.
Also, this doesn't change anything for Python 2 if using 'str' objects and
the type is not specified, it will still create a `BinaryArray`, is this what
we want?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> ------------------------------------------------------------------------
>
> Key: ARROW-2101
> URL: https://issues.apache.org/jira/browse/ARROW-2101
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.8.0
> Reporter: Bryan Cutler
> Assignee: Bryan Cutler
> Priority: Major
> Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow
> data of binary type, even if the user supplies type information. conversion
> of 'unicode' type works to create Arrow data of string types. For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)