[jira] [Commented] (ARROW-6934) [Python] Choose string column encoding in csv reader

Sascha Hofmann (Jira) Fri, 18 Oct 2019 06:18:22 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954599#comment-16954599
 ]


Sascha Hofmann commented on ARROW-6934:
---------------------------------------

I think get that. Does that mean if I'd like to convert this let's say to 
pandas table using to_pandas() and have the column as a column of strings arrow 
would need to mark columns with other encodings than utf8?

> [Python] Choose string column encoding in csv reader
> ----------------------------------------------------
>
>                 Key: ARROW-6934
>                 URL: https://issues.apache.org/jira/browse/ARROW-6934
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Python
>            Reporter: Sascha Hofmann
>            Priority: Major
>
> I was wondering whether there is a possibility to provide a different 
> encoding for string columns in the parse option of the csv reader in pyarrow. 
> I saw that there is a check whether or not a column is utf8 encoded. The 
> default seems to be that if that turns out to be false the column is 
> interpreted as binary.
> Is there any way to have a fallback option, meaning if the check_utf8 is 
> false then maybe try latin-1 before turning to binary?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6934) [Python] Choose string column encoding in csv reader

Reply via email to