[jira] [Updated] (PARQUET-2151) Drop Hadoop 1 input stream support from parquet-hadoop

Steve Loughran (Jira) Mon, 06 Jun 2022 10:38:06 -0700


     [ 
https://issues.apache.org/jira/browse/PARQUET-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated PARQUET-2151:
------------------------------------
    Summary: Drop Hadoop 1 input stream support from parquet-hadoop   (was: 
parquet-hadoop to drop Hadoop 1 input stream support)

> Drop Hadoop 1 input stream support from parquet-hadoop 
> -------------------------------------------------------
>
>                 Key: PARQUET-2151
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2151
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.13.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> Parquet uses reflection to load a hadoop2 input stream, falling back to a 
> hadoop-1 compatible client if not found.
> All hadoop 2.0.2+ releases work with H2SeekableInputStream, so 
> H1SeekableInputStream can be cut and the binding to H2SeekableInputStream 
> reworked to avoid needing reflection. This would make it a lot easier to 
> probe for/use the bytebuffer input, and line the code up for more recent 
> hadoop releases.
> One thing H1SeekableInputStream does do is read into a temp array if the 
> FSDataInputStream doesn't support , that is, doesn't implement 
> ByteBufferReadable.
> but FSDataInputStream simply forwards that to the inner stream, if it too 
> implements ByteBufferReadable. Filesystems which don't (the cloud stores) 
> can't be read through H2SeekableInputStream.read(ByteBufferReadable). If this 
> desired, H2SeekableInputStream will need to dynamically downgrade to 
> DelegatingSeekableInputStream's base methods if a call to 
> FSDataInputStream.read(ByteBuffer) fails.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (PARQUET-2151) Drop Hadoop 1 input stream support from parquet-hadoop

Reply via email to