Ryan Blue created PARQUET-674:
---------------------------------

             Summary: Add an abstraction to get the length of a stream
                 Key: PARQUET-674
                 URL: https://issues.apache.org/jira/browse/PARQUET-674
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
            Reporter: Ryan Blue


PARQUET-400 introduces {{SeekableInputStream}} to wrap Hadoop v1 and v2 streams 
and provide ByteBuffer access transparently. This can also be used as an 
abstraction to allow Parquet to work without the Hadoop API. The missing 
component is an abstraction that knows how long the file stream is for reading 
the footer. This could be done by adding a {{getLength}} method to the new 
stream interface, but I think there is value in adding a higher-level 
abstraction that carries information about the file and can open streams for 
it. This abstraction could be passed to a PageReadStore, which could have more 
complicated logic including parallel streams to read column chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to