[
https://issues.apache.org/jira/browse/HADOOP-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-19131:
------------------------------------
Description:
parquet, avro etc are still stuck building with older hadoop releases.
This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5
years old such as HADOOP-15229 just aren't picked up.
This lack of openFIle() adoption hurts working with files in cloud storage as
* extra HEAD requests are made
* read policies can't be explicitly set
* split start/end can't be passed down
HADOOP-18679 added a new WrappedIO class.
This jira proposes extending this with
* more of the filesystem/input stream methods
* iOStatistics
* Pull in parquet DynMethods to dynamially wrap and invoke through tests. This
class, DynamicWrappedIO is intended to be copied into libraries (parquet,
iceberg) for their own use.
* existing tests to use the dynamic binding for end-to-end testing.
+then get into the downstream libraries and use where appropriate
was:
parquet, avro etc are still stuck building with older hadoop releases.
This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5
years old such as HADOOP-15229 just aren't picked up.
This lack of openFIle() adoption hurts working with files in cloud storage as
* extra HEAD requests are made
* read policies can't be explicitly set
* split start/end can't be passed down
Proposed
# create class org.apache.hadoop.io.WrappedOperations
# add methods to wrap the apis
# test in contract tests via reflection loading -verifies we have done it
properly.
> WrappedIO to export modern filesystem/statistics APIs in a reflection
> friendly form
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-19131
> URL: https://issues.apache.org/jira/browse/HADOOP-19131
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs, fs/azure, fs/s3
> Affects Versions: 3.4.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
>
> parquet, avro etc are still stuck building with older hadoop releases.
> This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5
> years old such as HADOOP-15229 just aren't picked up.
> This lack of openFIle() adoption hurts working with files in cloud storage as
> * extra HEAD requests are made
> * read policies can't be explicitly set
> * split start/end can't be passed down
> HADOOP-18679 added a new WrappedIO class.
> This jira proposes extending this with
> * more of the filesystem/input stream methods
> * iOStatistics
> * Pull in parquet DynMethods to dynamially wrap and invoke through tests.
> This class, DynamicWrappedIO is intended to be copied into libraries
> (parquet, iceberg) for their own use.
> * existing tests to use the dynamic binding for end-to-end testing.
> +then get into the downstream libraries and use where appropriate
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]