[
https://issues.apache.org/jira/browse/ARROW-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352023#comment-16352023
]
Jingyuan Wang commented on ARROW-2059:
--------------------------------------
Here is what I've done. I simple repeated the 1M rows and created a 30M and
100M testing csv files and try to repeat the process of reading from csv,
writing as feather and reading from feather and time each part. I also repeated
the measurement 10 times for the four combination of (python-2.7, python-3.6) x
(feather-format-0.3.1, feather-format-0.4.0).
Processing 100M rows files all failed on my laptop (16GB memory) except for the
version of python2.7 and feather-format-0.3.1.
The measurement of 1M rows is as following:
||python version||feather version|| # rows||write feather||read feather||
|2.7|0.3.1|1M|0.06216781139|0.05903599262|
|2.7|0.4.0|1M|0.1335380793|0.04576666355|
|3.6|0.3.1|1M|0.07768514156|0.09041910172|
|3.6|0.4.0|1M|0.08690385818|0.05801310539|
The measuremnt of 30M rows is as following:
||python version||feather version|| # rows||write feather||read feather||
|2.7|0.3.1|30M|1.747310066|2.35606482|
|2.7|0.4.0|30M|3.5653723|1.934461188|
|3.6|0.3.1|30M|2.407458949|2.811572456|
|3.6|0.4.0|30M|2.925034189|1.852504301|
>From both tables, performance of writing to feather did degrade from 0.3.1 to
>0.4.0 with python2 being more dramatically. Reading feather files were
>actually faster with the newer feather version.
One other thing, I noticed that feather-format-0.3.1 does not even depend on
Arrow. So the performance difference is more than the Arrow's version upgrade.
And I do think we need some thorough benchmarks for Arrow or do we already have
them?
> [Python] Possible performance regression in Feather read/write path
> -------------------------------------------------------------------
>
> Key: ARROW-2059
> URL: https://issues.apache.org/jira/browse/ARROW-2059
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Wes McKinney
> Assignee: Jingyuan Wang
> Priority: Major
> Fix For: 0.9.0
>
>
> See discussion in https://github.com/wesm/feather/issues/329. Needs to be
> investigated
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)