[ https://issues.apache.org/jira/browse/ARROW-12428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324043#comment-17324043 ]
David Li commented on ARROW-12428: ---------------------------------- And for local files, to confirm that pre_buffer isn't a negative: {noformat} Pandas: 14.566267257090658 seconds PyArrow: 6.649410092970356 seconds PyArrow (pre-buffer): 6.627140663098544 seconds {noformat} This is on a system with NVME storage, so results may vary for spinning-rust or SATA SSDs. > [Python] pyarrow.parquet.read_* should use pre_buffer=True > ---------------------------------------------------------- > > Key: ARROW-12428 > URL: https://issues.apache.org/jira/browse/ARROW-12428 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: David Li > Assignee: David Li > Priority: Major > Fix For: 5.0.0 > > > If the user is synchronously reading a single file, we should try to read it > as fast as possible. The one sticking point might be whether it's beneficial > to enable this no matter the filesystem or whether we should try to only > enable it on high-latency filesystems. -- This message was sent by Atlassian Jira (v8.3.4#803005)