[
https://issues.apache.org/jira/browse/ARROW-12428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324043#comment-17324043
]
David Li commented on ARROW-12428:
----------------------------------
And for local files, to confirm that pre_buffer isn't a negative:
{noformat}
Pandas: 14.566267257090658 seconds
PyArrow: 6.649410092970356 seconds
PyArrow (pre-buffer): 6.627140663098544 seconds {noformat}
This is on a system with NVME storage, so results may vary for spinning-rust or
SATA SSDs.
> [Python] pyarrow.parquet.read_* should use pre_buffer=True
> ----------------------------------------------------------
>
> Key: ARROW-12428
> URL: https://issues.apache.org/jira/browse/ARROW-12428
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: David Li
> Assignee: David Li
> Priority: Major
> Fix For: 5.0.0
>
>
> If the user is synchronously reading a single file, we should try to read it
> as fast as possible. The one sticking point might be whether it's beneficial
> to enable this no matter the filesystem or whether we should try to only
> enable it on high-latency filesystems.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)