[ 
https://issues.apache.org/jira/browse/HADOOP-19388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19388:
------------------------------------
    Description: 
Now Hadoop 3.4.1 has shipped we can link up Iceberg to it
through reflection: https://github.com/apache/iceberg/pull/10233

However, we can't put a test in there, even just to talk to
the minio docker image which S3FileIO tests with, because
the tests would only work with hadoop 3.4.1+

Proposed: add a validation test here, initially just with a JAR built from the 
PR.
Initially this just says "it works as expected".
However, it will go on to become the regression tests "it still works",
so there's no need to wait for test downstream to be run and failures to be 
reported back.

We need a test suite which 
* Adds a test-time dependency on iceberg JAR with bulk delete through the 
HadoopFileIO class.
* Runs compliance tests, single/multi delete, complex names, directories, 
missing paths
* Parameterized on single/multi delete enables in s3a, iceberg to use/not use 
bulk delete
* includes IOStats assertions to verify bulk delete was actually used.
* maybe: mixes in some local file:// files to so as to validate multiple stores 
with different page sizes.

I had started this within HADOOP-19385, with iceberg jar one of the formats and 
the new test module to include the base contract test suite.


Note also: until iceberg ships with the PR in, this cannot be
merged.


  was:


Now Hadoop 3.4.1 has shipped we can link up Iceberg to it
through reflection: https://github.com/apache/iceberg/pull/10233

However, we can't put a test in there, even just to talk to
the minio docker image which S3FileIO tests with, because
the tests would only work with hadoop 3.4.1+

Proposed: add a validation test here, initially just with a JAR built from the 
PR.
Initially this just says "it works as expected".
However, it will go on to become the regression tests "it still works",
so there's no need to wait for test downstream to be run and failures to be 
reported back.

We need a test suite which 
* Adds a test-time dependency on iceberg JAR with bulk delete through the 
HadoopFileIO class.
* Runs compliance tests, single/multi delete, complex names, directories, 
missing paths
* Parameterized on single/multi delete enables in s3a, iceberg to use/not use 
bulk delete
* includes IOStats assertions to verify bulk delete was actually used.
* mixes in some local file:// files to so as to validate multiple stores with 
different page sizes.

I had started this within HADOOP-19385, with iceberg jar one of the formats and 
the new test module to include the base contract test suite.

But as the iceberg JAR is java17+, it rapidly becomes unworkable.

Instead, it will all go into s3a with a new java17 profile which will
* add iceberg jar dependency
* add a new src/test/java17 test source tree.
* contain a minimal abstract base test
* s3a implementation

Once Hadoop is java17 then it can be moved into to the main branch. 

Note also: until iceberg actually ships with the PR in, this cannot be
merged.



> S3A: Validate bulk delete through Iceberg HadoopFileIO
> ------------------------------------------------------
>
>                 Key: HADOOP-19388
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19388
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, test
>    Affects Versions: 3.4.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> Now Hadoop 3.4.1 has shipped we can link up Iceberg to it
> through reflection: https://github.com/apache/iceberg/pull/10233
> However, we can't put a test in there, even just to talk to
> the minio docker image which S3FileIO tests with, because
> the tests would only work with hadoop 3.4.1+
> Proposed: add a validation test here, initially just with a JAR built from 
> the PR.
> Initially this just says "it works as expected".
> However, it will go on to become the regression tests "it still works",
> so there's no need to wait for test downstream to be run and failures to be 
> reported back.
> We need a test suite which 
> * Adds a test-time dependency on iceberg JAR with bulk delete through the 
> HadoopFileIO class.
> * Runs compliance tests, single/multi delete, complex names, directories, 
> missing paths
> * Parameterized on single/multi delete enables in s3a, iceberg to use/not use 
> bulk delete
> * includes IOStats assertions to verify bulk delete was actually used.
> * maybe: mixes in some local file:// files to so as to validate multiple 
> stores with different page sizes.
> I had started this within HADOOP-19385, with iceberg jar one of the formats 
> and the new test module to include the base contract test suite.
> Note also: until iceberg ships with the PR in, this cannot be
> merged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to