[
https://issues.apache.org/jira/browse/HADOOP-19057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816989#comment-17816989
]
ASF GitHub Bot commented on HADOOP-19057:
-----------------------------------------
steveloughran opened a new pull request, #6548:
URL: https://github.com/apache/hadoop/pull/6548
The AWS landsat data previously used in some S3A tests is no longer
accessible
This PR moves to the new external file
s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
* Large enough file for scale tests
* Bucket supports anonymous access
* Ends in .gz to keep codec tests happy
* No spaces in path to keep bucket-info happy
Test Code Changes
* Leaves the test key name alone: fs.s3a.scale.test.csvfile
* Rename all methods and fields move remove "csv" from their names and move
to "external file" we no longer require it to be CSV.
* Path definition and helper methods have been moved to
PublicDatasetTestUtils
* Improve error reporting in ITestS3AInputStreamPerformance if the file is
too short
This is the V1 SDK version of the patch; it has deleted
ITestAWSStatisticCollection as part of the changes.
With S3 Select removed, there is no need for the file to be a CSV file;
there is a test which tries to unzip it; other tests have a minimum file size.
Consult the JIRA for the settings to add to auth-keys.xml to switch earlier
builds to this same file.
Contributed by Steve Loughran
### How was this patch tested?
s3 london `-Dparallel-tests -DtestsThreadCount=8 -Dscale`
### For code changes:
- [X] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [X] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> S3 public test bucket landsat-pds unreadable -needs replacement
> ---------------------------------------------------------------
>
> Key: HADOOP-19057
> URL: https://issues.apache.org/jira/browse/HADOOP-19057
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3, test
> Affects Versions: 3.4.0, 3.2.4, 3.3.9, 3.3.6, 3.5.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Critical
> Labels: pull-request-available
> Fix For: 3.5.0
>
>
> The s3 test bucket used in hadoop-aws tests of S3 select and large file reads
> is no longer publicly accessible
> {code}
> java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on
> landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null
> (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended
> Request ID:
> O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null
> {code}
> * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large
> file for some reading tests
> * changing the default value disables s3 select tests on older releases
> * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it
> will be skipped
> Proposed
> * we locate a new large file under the (requester pays) s3a://usgs-landsat/
> bucket . All releases with HADOOP-18168 can use this
> * update 3.4.1 source to use this; document it
> * do something similar for 3.3.9 + maybe even cut s3 select there too.
> * document how to use it on older releases with requester-pays support
> * document how to completely disable it on older releases.
> h2. How to fix (most) landsat test failures on older releases
> add this to your auth-keys.xml file. Expect some failures in a few tests
> with-hardcoded references to the bucket (assumed role delegation tokens)
> {code}
> <property>
> <name>fs.s3a.scale.test.csvfile</name>
> <value>s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz</value>
> <description>file used in scale tests</description>
> </property>
> <property>
> <name>fs.s3a.bucket.noaa-cors-pds.endpoint.region</name>
> <value>us-east-1</value>
> </property>
> <property>
> <name>fs.s3a.bucket.noaa-isd-pds.multipart.purge</name>
> <value>false</value>
> <description>Don't try to purge uploads in the read-only bucket, as
> it will only create log noise.</description>
> </property>
> <property>
> <name>fs.s3a.bucket.noaa-isd-pds.probe</name>
> <value>0</value>
> <description>Let's postpone existence checks to the first IO operation
> </description>
> </property>
> <property>
> <name>fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header</name>
> <value>false</value>
> <description>Do not add the referrer header</description>
> </property>
> <property>
> <name>fs.s3a.bucket.noaa-isd-pds.prefetch.block.size</name>
> <value>128k</value>
> <description>Use a small prefetch size so tests fetch multiple
> blocks</description>
> </property>
> <property>
> <name>fs.s3a.select.enabled</name>
> <value>false</value>
> </property>
> {code}
> Some delegation token tests will still fail; these have hard-coded references
> to the old bucket. *Do not worry about these*
> {code}
> [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[0] »
> AccessDenied s3a://la...
> [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[1] »
> AccessDenied s3a://la...
> [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[2] »
> AccessDenied s3a://la...
> [ERROR]
> ITestRoleDelegationInFilesystem>ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->ITestSessionDelegationInFilesystem.readLandsatMetadata:614
> » AccessDenied
> [ERROR]
> ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->readLandsatMetadata:614
> » AccessDenied
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]