Sascha Hofmann created ARROW-6934:
-------------------------------------
Summary: [Python] Choose string column encoding in csv reader
Key: ARROW-6934
URL: https://issues.apache.org/jira/browse/ARROW-6934
Project: Apache Arrow
Issue Type: Wish
Reporter: Sascha Hofmann
I was wondering whether there is a possibility to provide a different encoding
for string columns in the parse option of the csv reader in pyarrow. I saw that
there is a check whether or not a column is utf8 encoded. The default seems to
be that if that turns out to be false the column is interpreted as binary.
Is there any way to have a fallback option, meaning if the check_utf8 is false
then maybe try latin-1 before turning to binary?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)