[jira] [Commented] (ARROW-7251) [Python] Open CSVs with different encodings

Sascha Hofmann (Jira) Mon, 25 Nov 2019 10:40:52 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981785#comment-16981785
 ]


Sascha Hofmann commented on ARROW-7251:
---------------------------------------

I just ran a simple timeit notebook for 30MB csv with 7 string columns one bool 
and one int column. Differences are quite massive.

*Pandas to pyarrow:*

314 ms ± 20 ms 

*Only pyarrow:*

38.7 ms ± 3.15 ms

 

We are trying to hide as much complexity from the user, who might throw in 
arbitrarily old CSVs, as possible. That's why converting upfront is not really 
an option but I guess having the pandas reader as a fallback should be 
sufficient.

> [Python] Open CSVs with different encodings
> -------------------------------------------
>
>                 Key: ARROW-7251
>                 URL: https://issues.apache.org/jira/browse/ARROW-7251
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Python
>            Reporter: Sascha Hofmann
>            Priority: Major
>
> I would like to open an UTF-16 encoded CSVs (among others) without 
> preprocessing in let's say Pandas. Is there maybe a way to do this already ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7251) [Python] Open CSVs with different encodings

Reply via email to