Ryan Blue created PARQUET-674:
---------------------------------
Summary: Add an abstraction to get the length of a stream
Key: PARQUET-674
URL: https://issues.apache.org/jira/browse/PARQUET-674
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Reporter: Ryan Blue
PARQUET-400 introduces {{SeekableInputStream}} to wrap Hadoop v1 and v2 streams
and provide ByteBuffer access transparently. This can also be used as an
abstraction to allow Parquet to work without the Hadoop API. The missing
component is an abstraction that knows how long the file stream is for reading
the footer. This could be done by adding a {{getLength}} method to the new
stream interface, but I think there is value in adding a higher-level
abstraction that carries information about the file and can open streams for
it. This abstraction could be passed to a PageReadStore, which could have more
complicated logic including parallel streams to read column chunks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)