[ 
https://issues.apache.org/jira/browse/HADOOP-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19131:
------------------------------------
    Description: 
parquet, avro etc are still stuck building with older hadoop releases. 

This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5 
years old such as HADOOP-15229 just aren't picked up.

This lack of openFIle() adoption hurts working with files in cloud storage as
* extra HEAD requests are made
* read policies can't be explicitly set
* split start/end can't be passed down

HADOOP-18679 added a new WrappedIO class.

This jira proposes extending this with
* more of the filesystem/input stream methods
* iOStatistics
* Pull in parquet DynMethods to dynamially wrap and invoke through tests. This 
class, DynamicWrappedIO is intended to be copied into libraries (parquet, 
iceberg) for their own use. 
* existing tests to use the dynamic binding for end-to-end testing.

+then get into the downstream libraries and use where appropriate

  was:
parquet, avro etc are still stuck building with older hadoop releases. 

This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5 
years old such as HADOOP-15229 just aren't picked up.

This lack of openFIle() adoption hurts working with files in cloud storage as
* extra HEAD requests are made
* read policies can't be explicitly set
* split start/end can't be passed down

Proposed
# create class org.apache.hadoop.io.WrappedOperations
# add methods to wrap the apis
# test in contract tests via reflection loading -verifies we have done it 
properly.


> WrappedIO to export modern filesystem/statistics APIs in a reflection 
> friendly form
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-19131
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19131
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/azure, fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> parquet, avro etc are still stuck building with older hadoop releases. 
> This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5 
> years old such as HADOOP-15229 just aren't picked up.
> This lack of openFIle() adoption hurts working with files in cloud storage as
> * extra HEAD requests are made
> * read policies can't be explicitly set
> * split start/end can't be passed down
> HADOOP-18679 added a new WrappedIO class.
> This jira proposes extending this with
> * more of the filesystem/input stream methods
> * iOStatistics
> * Pull in parquet DynMethods to dynamially wrap and invoke through tests. 
> This class, DynamicWrappedIO is intended to be copied into libraries 
> (parquet, iceberg) for their own use. 
> * existing tests to use the dynamic binding for end-to-end testing.
> +then get into the downstream libraries and use where appropriate



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to