[jira] [Commented] (PARQUET-674) Add an abstraction to get the length of a stream

Rohit Aggarwal (JIRA) Wed, 31 May 2017 07:26:23 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031236#comment-16031236
 ]


Rohit Aggarwal commented on PARQUET-674:
----------------------------------------

We have observed that this commit leads to file descriptors left in 
{{CLOSE_WAIT}} state and not actually being close which will cause issues given 
enough calls to {{readFooters}} method. We are using Hadoop 2.7.2.

> Add an abstraction to get the length of a stream
> ------------------------------------------------
>
>                 Key: PARQUET-674
>                 URL: https://issues.apache.org/jira/browse/PARQUET-674
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>             Fix For: 1.9.0
>
>
> PARQUET-400 introduces {{SeekableInputStream}} to wrap Hadoop v1 and v2 
> streams and provide ByteBuffer access transparently. This can also be used as 
> an abstraction to allow Parquet to work without the Hadoop API. The missing 
> component is an abstraction that knows how long the file stream is for 
> reading the footer. This could be done by adding a {{getLength}} method to 
> the new stream interface, but I think there is value in adding a higher-level 
> abstraction that carries information about the file and can open streams for 
> it. This abstraction could be passed to a PageReadStore, which could have 
> more complicated logic including parallel streams to read column chunks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PARQUET-674) Add an abstraction to get the length of a stream

Reply via email to