[ 
https://issues.apache.org/jira/browse/PARQUET-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903217#comment-16903217
 ] 

Wes McKinney commented on PARQUET-1636:
---------------------------------------

I would like to understand the nature of the problem. 

The stream is created here:

https://github.com/apache/arrow/blob/master/cpp/src/parquet/file_reader.cc#L119

The raw file position should not be moved outside of the span of the 
ColumnChunk in the file. If that's happening, that is a bug. The intention of 
the buffering option is to do read buffering through the ColumnChunk instead of 
reading the entire chunk into memory (the alternative). 

> [C++] Incompatibility due to moving from Parquet to Arrow IO interfaces
> -----------------------------------------------------------------------
>
>                 Key: PARQUET-1636
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1636
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Deepak Majeti
>            Assignee: Wes McKinney
>            Priority: Major
>
> We moved to the Arrow IO interfaces as part of 
> https://issues.apache.org/jira/browse/PARQUET-1422
> However, the BufferedInputStream implementations between Parquet and Arrow 
> are different.
> Parquet's BufferedInputStream used to takes a RandomAccessSource. Arrow's 
> implementation takes an InputStream. As a result, theĀ 
> {{::arrow::io::BufferedInputStream::Peek(which invokes Read())}} 
> implementation causes the raw source (input to {{BufferedInputStream}}) to 
> change its offset on Peek(). This did not happen in the Parquet's 
> BufferedInputStream implementation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to