[
https://issues.apache.org/jira/browse/PARQUET-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903217#comment-16903217
]
Wes McKinney commented on PARQUET-1636:
---------------------------------------
I would like to understand the nature of the problem.
The stream is created here:
https://github.com/apache/arrow/blob/master/cpp/src/parquet/file_reader.cc#L119
The raw file position should not be moved outside of the span of the
ColumnChunk in the file. If that's happening, that is a bug. The intention of
the buffering option is to do read buffering through the ColumnChunk instead of
reading the entire chunk into memory (the alternative).
> [C++] Incompatibility due to moving from Parquet to Arrow IO interfaces
> -----------------------------------------------------------------------
>
> Key: PARQUET-1636
> URL: https://issues.apache.org/jira/browse/PARQUET-1636
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp
> Reporter: Deepak Majeti
> Assignee: Wes McKinney
> Priority: Major
>
> We moved to the Arrow IO interfaces as part of
> https://issues.apache.org/jira/browse/PARQUET-1422
> However, the BufferedInputStream implementations between Parquet and Arrow
> are different.
> Parquet's BufferedInputStream used to takes a RandomAccessSource. Arrow's
> implementation takes an InputStream. As a result, theĀ
> {{::arrow::io::BufferedInputStream::Peek(which invokes Read())}}
> implementation causes the raw source (input to {{BufferedInputStream}}) to
> change its offset on Peek(). This did not happen in the Parquet's
> BufferedInputStream implementation.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)