+1 (non-binding)

Bests,
Dongjoon.

On 2021/06/08 12:23:43, Steve Loughran <ste...@cloudera.com.INVALID> wrote: 
> +1, binding.
> 
> Awesome piece of work!
> 
> I've done three forms of qualification, all related to s3 and azure storage
> 
>    1. tarball validate, CLI use
>    2. build/test of downstream modules off maven artifacts; mine and some
>    other ASF ones. I  (and it its very much me) have broken some downstream
>    modules tests, as I will discuss below. PRs submitted to the relevant
>    projects
>    3. local rerun of the hadoop-aws and hadoop-azure test suites
> 
> 
> *Regarding issues which surfaced*
> 
> Wei-Chiu: can you register your private GPG key with the public keystores?
> The gpg client apps let you do this? Then we can coordinate signing each
> other's keys
> 
> Filed PRs for the test regressions:
> https://github.com/apache/hbase-filesystem/pull/23
> https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/569
> 
> *Artifact validation*
> 
> SHA checksum good:
> 
> 
> shasum -a 512 hadoop-3.3.1-RC3.tar.gz
> b80e0a8785b0f3d75d9db54340123872e39bad72cc60de5d263ae22024720e6e824e022090f01e248bf105e03b0f06163729adbe15b5b0978bae0447571e22eb
>  hadoop-3.3.1-RC3.tar.gz
> 
> 
> GPG: trickier, because Wei-Chiu wasn't trusted
> 
> > gpg --verify hadoop-3.3.1-RC3.tar.gz.asc
> 
> gpg: assuming signed data in 'hadoop-3.3.1-RC3.tar.gz'
> gpg: Signature made Tue Jun  1 11:00:41 2021 BST
> gpg:                using RSA key CD32D773FF41C3F9E74BDB7FB362E1C021854B9D
> gpg: requesting key 0xB362E1C021854B9D from hkps server
> hkps.pool.sks-keyservers.net
> gpg: Can't check signature: No public key
> 
> 
> *Wei-Chiu: can you add your public keys to the GPG key servers*
> 
> To validate the keys I went to the directory where I have our site under
> svn (https://dist.apache.org/repos/dist/release/hadoop/common) , and, after
> reinstalling svn (where did it go? when did it go?) did an svn update to
> get the keys
> 
> Did a gpg import of the KEYS file, added
> 
> gpg: key 0x386D80EF81E7469A: public key "Brahma Reddy Battula (CODE SIGNING
> KEY) <bra...@apache.org>" imported
> gpg: key 0xFC8D04357BB49FF0: public key "Sammi Chen (CODE SIGNING KEY) <
> sammic...@apache.org>" imported
> gpg: key 0x36243EECE206BB0D: public key "Masatake Iwasaki (CODE SIGNING
> KEY) <iwasak...@apache.org>" imported
> *gpg: key 0xB362E1C021854B9D: public key "Wei-Chiu Chuang
> <weic...@apache.org <weic...@apache.org>>" imported*
> 
> This time an import did work, but Wei-Chiu isn't trusted by anyone yet
> 
> gpg --verify hadoop-3.3.1-RC3.tar.gz.asc
> gpg: assuming signed data in 'hadoop-3.3.1-RC3.tar.gz'
> gpg: Signature made Tue Jun  1 11:00:41 2021 BST
> gpg:                using RSA key CD32D773FF41C3F9E74BDB7FB362E1C021854B9D
> gpg: Good signature from "Wei-Chiu Chuang <weic...@apache.org>" [unknown]
> gpg: WARNING: This key is not certified with a trusted signature!
> gpg:          There is no indication that the signature belongs to the
> owner.
> Primary key fingerprint: CD32 D773 FF41 C3F9 E74B  DB7F B362 E1C0 2185 4B9D
> 
> (Wei-Chiu, let's coordinate signing each other's public keys via a slack
> channel; you need to be in the apache web of trust)
> 
> 
> > time gunzip hadoop-3.3.1-RC3.tar.gz
> 
> (5 seconds)
> 
> cd into the hadoop dir;
> cp my confs in: cp ~/(somewhere)/hadoop-conf/*  etc/hadoop/
> cp the hadoop-azure dependencies from share/hadoop/tools/lib/ to
> share/hadoop/common/lib (products built targeting Azure put things there)
> 
> run: all the s3a "qualifying an AWS SDK update" commands
> https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/testing.html#Qualifying_an_AWS_SDK_Update
> 
> run: basic abfs:// FS operations; again no problems.
> FWIW I think we should consider having the hadoop-aws module and
> dependencies, and the aws ones in hadoop-common/lib. I can get them there
> through env vars and the s3guard shell sets things up, but azure is fiddly.
> 
> *Build and test cloudstore JAR; invoke from CLI*
> 
> This is my cloud-storage extension library
> https://github.com/steveloughran/cloudstore
> 
> I've always intended to put it into hadoop, but as it is where a lot of
> diagnostics and quick way to put together fixes "here's a faster du ("dux"")
> 
> https://github.com/steveloughran/cloudstore.git
> 
> modify the hadoop-3.3 profile to use 3.3.1 artifacts, then build with
> snapshots enabled. Because I'd not (yet) built any 3.3.1 artifacts locally,
> this fetched them from maven staging
> 
> mvn package -Phadoop-3.3 -Pextra -Psnapshots-and-staging
> 
> 
> Set up env var $CLOUDSTORE to point to JAR; $BUCKET to s3a bucket, run
> various commands (storediag, cloudup, ...). As an example, here's the "dux"
> command, which is "hadoop fs -du" with parallel scan underneath the dir for
> better scaling
> 
> 
> bin/hadoop jar $CLOUDSTORE dux  -threads 64 -limit 1000 -verbose
> s3a://stevel-london/
> 
> output is in
> https://gist.github.com/steveloughran/664d30cef20f605f3164ad01f92a458a
> 
> *Build and (unit test) google GCS: *
> 
> 
> Two test failures, one of which was classpath related and the other just a
> new rename contract test needing a new setting in gs.xml to declare what
> rename of file over file does.
> 
> Everything is covered in:
> https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/569
> 
> Classpath: assertJ not coming through hadoop-common-test JAR dependencies.
> 
> [ERROR]
> com.google.cloud.hadoop.fs.gcs.contract.TestInMemoryGoogleContractRootDirectory.testSimpleRootListing
>  Time elapsed: 0.093 s  <<< ERROR!
> java.lang.NoClassDefFoundError: org/assertj/core/api/Assertions
> Caused by: java.lang.ClassNotFoundException: org.assertj.core.api.Assertions
> 
> 
> Happens because I added some tests to the AbstractContractRenameTest which
> use assertJ assertions.
> Assertj is declared in test scope for hadoop-common test JAR, it's somehow
> not propagating. HBoss has the same issue.
> 
> 
>     <dependency>
>       <groupId>org.assertj</groupId>
>       <artifactId>assertj-core</artifactId>
>       <scope>test</scope>
>     </dependency>
> 
> I really don't understand what is up with our declared exports; just
> reviewed them. Nothing we can do about it that I can see.
> 
> Rename test failure is from a new test, with the expected behaviour needing
> definition.
> 
> [ERROR] Failures:
> [ERROR]
> TestInMemoryGoogleContractRename>AbstractContractRenameTest.testRenameFileOverExistingFile:131->Assert.fail:89
> expected
> rename(gs://fake-in-memory-test-bucket/contract-test/source-256.txt,
> gs://fake-in-memory-test-bucket/contract-test/dest-512.txt) to be rejected
> with exception, but got false
> 
> Fix, add "fs.contract.rename-returns-false-if-dest-exists" = true to the
> XML contract.
> 
> *Build and test HBoss*
> 
> This is  the HBase extension to use ZK to lock file accesses on S3
> 
> I've broken their build through to changes to the internal S3 client
> factory as some new client options were passed down (HADOOP-13551).
> That change moved to a new build parameter object, so we can add future
> changes without breaking the signature again (mehakmeet already has in
> HADOOP-17705)
> 
> https://issues.apache.org/jira/browse/HBASE-25900
> 
> 
> Got an initial PR up, though will need to do more so that it will also
> compile/test against older builds
> https://github.com/apache/hbase-filesystem/pull/23
> 
> *Build spark, then test S3A Committers through it*
> 
> Build spark-3 against 3.3.1, then ran integration tests against S3 london
> 
> Test are in: https://github.com/hortonworks-spark/cloud-integration.git
> 
> Most of an afternoon was frittered away dealing with the fact that the
> spark version move (2.4 to 3.2) meant scalatest upgrade from 3.0 to 3.20 **and
> every single test failed to compile because the scalatest project moved the
> foundational test suite into a new package**. I had to do that same upgrade
> to test my WiP manifest committer (MAPREDUCE-7341) against ABFS, so it's
> not completely wasted. It does mean that module and tests is scala 3+ only.
> 
> hadoop-aws and hadoop-azure test suites
> 
> For these I checked out branch-3.3.1, rebuilt it and ran the test suites in
> the hadoop-azure and hadoop-aws modules. This triggered a rebuild of those
> two modules.
> 
> I did this after doing all the other checks, so everything else was
> qualified against the genuine RC3 artifacts.
> 
> 
> hadoop-aws
> 
> run 1: -Dparallel-tests -DtestsThreadCount=5 -Dmarkers=keep
> run 2: -Dparallel-tests -DtestsThreadCount=6 -Ds3guard -Dscale -Ddynamo
> 
> azure
>  -Dparallel-tests=abfs -DtestsThreadCount=5 -Dscale
> 
>  [ERROR] Errors:
>  [ERROR]
> ITestAbfsFileSystemContractSecureDistCp>AbstractContractDistCpTest.testDistCpWithIterator:642
> ยป TestTimedOut
>  [INFO]
> 
> This is https://issues.apache.org/jira/browse/HADOOP-17628
> 
> 
> Overall then:
> 
>    1. All production code good.
>    2. some expansion of filesystem tests require some changes downstream,
>    and the change in the S3Client from HADOOP-13551 the HBoss tests using an
>    internal interface from compiling. The move to a parameter object (and
>    documenting this use) is intended to prevent this reoccurring.
> 
> 
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: dev-h...@hadoop.apache.org

Reply via email to