[ 
https://issues.apache.org/jira/browse/HADOOP-19057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816989#comment-17816989
 ] 

ASF GitHub Bot commented on HADOOP-19057:
-----------------------------------------

steveloughran opened a new pull request, #6548:
URL: https://github.com/apache/hadoop/pull/6548

   
   The AWS landsat data previously used in some S3A tests is no longer 
accessible
   
   This PR moves to the new external file
   s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
   
   * Large enough file for scale tests
   * Bucket supports anonymous access
   * Ends in .gz to keep codec tests happy
   * No spaces in path to keep bucket-info happy
   
   Test Code Changes
   * Leaves the test key name alone: fs.s3a.scale.test.csvfile
   * Rename all methods and fields move remove "csv" from their names and move 
to "external file" we no longer require it to be CSV.
   * Path definition and helper methods have been moved to 
PublicDatasetTestUtils
   * Improve error reporting in ITestS3AInputStreamPerformance if the file is 
too short
   
   This is the V1 SDK version of the patch; it has deleted 
ITestAWSStatisticCollection as part of the changes.
   
   With S3 Select removed, there is no need for the file to be a CSV file; 
there is a test which tries to unzip it; other tests have a minimum file size.
   
   Consult the JIRA for the settings to add to auth-keys.xml to switch earlier 
builds to this same file.
   
   Contributed by Steve Loughran
   
   ### How was this patch tested?
   
   s3 london `-Dparallel-tests -DtestsThreadCount=8 -Dscale`
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [X] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> S3 public test bucket landsat-pds unreadable -needs replacement
> ---------------------------------------------------------------
>
>                 Key: HADOOP-19057
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19057
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, test
>    Affects Versions: 3.4.0, 3.2.4, 3.3.9, 3.3.6, 3.5.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>
> The s3 test bucket used in hadoop-aws tests of S3 select and large file reads 
> is no longer publicly accessible
> {code}
> java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on 
> landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null 
> (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended 
> Request ID: 
> O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null
> {code}
> * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large 
> file for some reading tests
> * changing the default value disables s3 select tests on older releases
> * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it 
> will be skipped
> Proposed
> * we locate a new large file under the (requester pays) s3a://usgs-landsat/ 
> bucket . All releases with HADOOP-18168 can use this
> * update 3.4.1 source to use this; document it
> * do something similar for 3.3.9 + maybe even cut s3 select there too.
> * document how to use it on older releases with requester-pays support
> * document how to completely disable it on older releases.
> h2. How to fix (most) landsat test failures on older releases
> add this to your auth-keys.xml file. Expect some failures in a few tests 
> with-hardcoded references to the bucket (assumed role delegation tokens)
> {code}
>   <property>
>     <name>fs.s3a.scale.test.csvfile</name>
>     <value>s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz</value>
>     <description>file used in scale tests</description>
>   </property>
>   <property>
>     <name>fs.s3a.bucket.noaa-cors-pds.endpoint.region</name>
>     <value>us-east-1</value>
>   </property>
>   <property>
>     <name>fs.s3a.bucket.noaa-isd-pds.multipart.purge</name>
>     <value>false</value>
>     <description>Don't try to purge uploads in the read-only bucket, as
>     it will only create log noise.</description>
>   </property>
>   <property>
>     <name>fs.s3a.bucket.noaa-isd-pds.probe</name>
>     <value>0</value>
>     <description>Let's postpone existence checks to the first IO operation 
> </description>
>   </property>
>   <property>
>     <name>fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header</name>
>     <value>false</value>
>     <description>Do not add the referrer header</description>
>   </property>
>   <property>
>     <name>fs.s3a.bucket.noaa-isd-pds.prefetch.block.size</name>
>     <value>128k</value>
>     <description>Use a small prefetch size so tests fetch multiple 
> blocks</description>
>   </property>
>   <property>
>     <name>fs.s3a.select.enabled</name>
>     <value>false</value>
>   </property>
> {code}
> Some delegation token tests will still fail; these have hard-coded references 
> to the old bucket. *Do not worry about these*
> {code}
> [ERROR]   ITestDelegatedMRJob.testJobSubmissionCollectsTokens[0] » 
> AccessDenied s3a://la...
> [ERROR]   ITestDelegatedMRJob.testJobSubmissionCollectsTokens[1] » 
> AccessDenied s3a://la...
> [ERROR]   ITestDelegatedMRJob.testJobSubmissionCollectsTokens[2] » 
> AccessDenied s3a://la...
> [ERROR]   
> ITestRoleDelegationInFilesystem>ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->ITestSessionDelegationInFilesystem.readLandsatMetadata:614
>  » AccessDenied
> [ERROR]   
> ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->readLandsatMetadata:614
>  » AccessDenied
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to