[ 
https://issues.apache.org/jira/browse/PARQUET-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903206#comment-16903206
 ] 

Zherui Cao commented on PARQUET-1636:
-------------------------------------

I checked out previous version and found that BufferedInputStream::Peek() did 
move offset when it calls Read or ReadAt.

But the previous InMemoryInputStream::Read() is calling its Peek() and then 
called ReadAt(offset), since it always specify offset, so the previous one 
works well.

But currently InMemoryInputStream(old) was replaced by BufferedInputStream(new) 
is using inputstream(non seekable) rather than randomAccessFile(seekable). 
inputstream has no function like Seek() or ReadAt(), therefore, calling Read 
has no control of the offset. 

> [C++] Incompatibility due to moving from Parquet to Arrow IO interfaces
> -----------------------------------------------------------------------
>
>                 Key: PARQUET-1636
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1636
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Deepak Majeti
>            Assignee: Wes McKinney
>            Priority: Major
>
> We moved to the Arrow IO interfaces as part of 
> https://issues.apache.org/jira/browse/PARQUET-1422
> However, the BufferedInputStream implementations between Parquet and Arrow 
> are different.
> Parquet's BufferedInputStream used to takes a RandomAccessSource. Arrow's 
> implementation takes an InputStream. As a result, the 
> {{::arrow::io::BufferedInputStream::Peek(which invokes Read())}} 
> implementation causes the raw source (input to {{BufferedInputStream}}) to 
> change its offset on Peek(). This did not happen in the Parquet's 
> BufferedInputStream implementation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to