[ 
https://issues.apache.org/jira/browse/PARQUET-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903206#comment-16903206
 ] 

Zherui Cao edited comment on PARQUET-1636 at 8/8/19 6:14 PM:
-------------------------------------------------------------

I checked out previous version and found that BufferedInputStream::Peek() did 
move offset when it calls Read or ReadAt.

But the previous InMemoryInputStream::Read() is calling its Peek() and then 
called ReadAt(offset), since it always specify offset, so the previous one 
works well, that is to say, this does not care about where the offset went, 
since every call of Read() will specify the position,

But currently InMemoryInputStream(old) was replaced by BufferedInputStream(new) 
is using inputstream(non seekable) rather than randomAccessFile(seekable). 
inputstream has no function like Seek() or ReadAt(), therefore, calling Read 
has no control of the offset. 


was (Author: czxrrr):
I checked out previous version and found that BufferedInputStream::Peek() did 
move offset when it calls Read or ReadAt.

But the previous InMemoryInputStream::Read() is calling its Peek() and then 
called ReadAt(offset), since it always specify offset, so the previous one 
works well.

But currently InMemoryInputStream(old) was replaced by BufferedInputStream(new) 
is using inputstream(non seekable) rather than randomAccessFile(seekable). 
inputstream has no function like Seek() or ReadAt(), therefore, calling Read 
has no control of the offset. 

> [C++] Incompatibility due to moving from Parquet to Arrow IO interfaces
> -----------------------------------------------------------------------
>
>                 Key: PARQUET-1636
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1636
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Deepak Majeti
>            Assignee: Wes McKinney
>            Priority: Major
>
> We moved to the Arrow IO interfaces as part of 
> https://issues.apache.org/jira/browse/PARQUET-1422
> However, the BufferedInputStream implementations between Parquet and Arrow 
> are different.
> Parquet's BufferedInputStream used to takes a RandomAccessSource. Arrow's 
> implementation takes an InputStream. As a result, the 
> {{::arrow::io::BufferedInputStream::Peek(which invokes Read())}} 
> implementation causes the raw source (input to {{BufferedInputStream}}) to 
> change its offset on Peek(). This did not happen in the Parquet's 
> BufferedInputStream implementation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to