[ 
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436517#comment-16436517
 ] 

ASF GitHub Bot commented on ARROW-2101:
---------------------------------------

BryanCutler commented on issue #1886: Bug fix for ARROW-2101
URL: https://github.com/apache/arrow/pull/1886#issuecomment-380981795
 
 
   Just so I have this straight, the old behavior was when the user specifies 
an explicit type as `pa.string()` and a binary object was found, it would 
fallback to `BinaryArray` and continue.  This changes it to try to convert the 
object to utf-8 and raises an error if it fails, only if the type is specified?
   
   Does anyone know if there was a reason to fallback in this case?  I think 
this change makes sense, but just want to make sure we are not breaking 
anything.
   
   Also, this doesn't change anything for Python 2 if using 'str' objects and 
the type is not specified, it will still create a `BinaryArray`, is this what 
we want?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> ------------------------------------------------------------------------
>
>                 Key: ARROW-2101
>                 URL: https://issues.apache.org/jira/browse/ARROW-2101
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Bryan Cutler
>            Assignee: Bryan Cutler
>            Priority: Major
>              Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
> data of binary type, even if the user supplies type information.  conversion 
> of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to