[
https://issues.apache.org/jira/browse/ARROW-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084251#comment-17084251
]
Remi Dettai commented on ARROW-7681:
------------------------------------
I've proposed a fix in [https://github.com/apache/arrow/pull/6949] that uses a
modified version of BufReader instead of nightly methods.
I've tested it on a large parquet of mine:
* on fast disk, both versions take ~30s to read the column
* on a slow mount, the the old version takes ~160s and the fixed one still
takes ~30s (still CPU bounded)
> [Rust] Explicitly seeking a BufReader will discard the internal buffer
> ----------------------------------------------------------------------
>
> Key: ARROW-7681
> URL: https://issues.apache.org/jira/browse/ARROW-7681
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust
> Reporter: Max Burke
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> This behavior was observed in the Parquet Rust file reader
> (parquet/src/util/io.rs).
>
> Pull request: [https://github.com/apache/arrow/pull/6280]
>
> From the Rust documentation for BufReader:
>
> "Seeking always discards the internal buffer, even if the seek position would
> otherwise fall within it. This guarantees that calling {{.into_inner()}}
> immediately after a seek yields the underlying reader at the same position."
>
> [https://doc.rust-lang.org/std/io/struct.BufReader.html#impl-Seek]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)