+1 (non-binding) Bests, Dongjoon.
On 2021/06/08 12:23:43, Steve Loughran <ste...@cloudera.com.INVALID> wrote: > +1, binding. > > Awesome piece of work! > > I've done three forms of qualification, all related to s3 and azure storage > > 1. tarball validate, CLI use > 2. build/test of downstream modules off maven artifacts; mine and some > other ASF ones. I (and it its very much me) have broken some downstream > modules tests, as I will discuss below. PRs submitted to the relevant > projects > 3. local rerun of the hadoop-aws and hadoop-azure test suites > > > *Regarding issues which surfaced* > > Wei-Chiu: can you register your private GPG key with the public keystores? > The gpg client apps let you do this? Then we can coordinate signing each > other's keys > > Filed PRs for the test regressions: > https://github.com/apache/hbase-filesystem/pull/23 > https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/569 > > *Artifact validation* > > SHA checksum good: > > > shasum -a 512 hadoop-3.3.1-RC3.tar.gz > b80e0a8785b0f3d75d9db54340123872e39bad72cc60de5d263ae22024720e6e824e022090f01e248bf105e03b0f06163729adbe15b5b0978bae0447571e22eb > hadoop-3.3.1-RC3.tar.gz > > > GPG: trickier, because Wei-Chiu wasn't trusted > > > gpg --verify hadoop-3.3.1-RC3.tar.gz.asc > > gpg: assuming signed data in 'hadoop-3.3.1-RC3.tar.gz' > gpg: Signature made Tue Jun 1 11:00:41 2021 BST > gpg: using RSA key CD32D773FF41C3F9E74BDB7FB362E1C021854B9D > gpg: requesting key 0xB362E1C021854B9D from hkps server > hkps.pool.sks-keyservers.net > gpg: Can't check signature: No public key > > > *Wei-Chiu: can you add your public keys to the GPG key servers* > > To validate the keys I went to the directory where I have our site under > svn (https://dist.apache.org/repos/dist/release/hadoop/common) , and, after > reinstalling svn (where did it go? when did it go?) did an svn update to > get the keys > > Did a gpg import of the KEYS file, added > > gpg: key 0x386D80EF81E7469A: public key "Brahma Reddy Battula (CODE SIGNING > KEY) <bra...@apache.org>" imported > gpg: key 0xFC8D04357BB49FF0: public key "Sammi Chen (CODE SIGNING KEY) < > sammic...@apache.org>" imported > gpg: key 0x36243EECE206BB0D: public key "Masatake Iwasaki (CODE SIGNING > KEY) <iwasak...@apache.org>" imported > *gpg: key 0xB362E1C021854B9D: public key "Wei-Chiu Chuang > <weic...@apache.org <weic...@apache.org>>" imported* > > This time an import did work, but Wei-Chiu isn't trusted by anyone yet > > gpg --verify hadoop-3.3.1-RC3.tar.gz.asc > gpg: assuming signed data in 'hadoop-3.3.1-RC3.tar.gz' > gpg: Signature made Tue Jun 1 11:00:41 2021 BST > gpg: using RSA key CD32D773FF41C3F9E74BDB7FB362E1C021854B9D > gpg: Good signature from "Wei-Chiu Chuang <weic...@apache.org>" [unknown] > gpg: WARNING: This key is not certified with a trusted signature! > gpg: There is no indication that the signature belongs to the > owner. > Primary key fingerprint: CD32 D773 FF41 C3F9 E74B DB7F B362 E1C0 2185 4B9D > > (Wei-Chiu, let's coordinate signing each other's public keys via a slack > channel; you need to be in the apache web of trust) > > > > time gunzip hadoop-3.3.1-RC3.tar.gz > > (5 seconds) > > cd into the hadoop dir; > cp my confs in: cp ~/(somewhere)/hadoop-conf/* etc/hadoop/ > cp the hadoop-azure dependencies from share/hadoop/tools/lib/ to > share/hadoop/common/lib (products built targeting Azure put things there) > > run: all the s3a "qualifying an AWS SDK update" commands > https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/testing.html#Qualifying_an_AWS_SDK_Update > > run: basic abfs:// FS operations; again no problems. > FWIW I think we should consider having the hadoop-aws module and > dependencies, and the aws ones in hadoop-common/lib. I can get them there > through env vars and the s3guard shell sets things up, but azure is fiddly. > > *Build and test cloudstore JAR; invoke from CLI* > > This is my cloud-storage extension library > https://github.com/steveloughran/cloudstore > > I've always intended to put it into hadoop, but as it is where a lot of > diagnostics and quick way to put together fixes "here's a faster du ("dux"") > > https://github.com/steveloughran/cloudstore.git > > modify the hadoop-3.3 profile to use 3.3.1 artifacts, then build with > snapshots enabled. Because I'd not (yet) built any 3.3.1 artifacts locally, > this fetched them from maven staging > > mvn package -Phadoop-3.3 -Pextra -Psnapshots-and-staging > > > Set up env var $CLOUDSTORE to point to JAR; $BUCKET to s3a bucket, run > various commands (storediag, cloudup, ...). As an example, here's the "dux" > command, which is "hadoop fs -du" with parallel scan underneath the dir for > better scaling > > > bin/hadoop jar $CLOUDSTORE dux -threads 64 -limit 1000 -verbose > s3a://stevel-london/ > > output is in > https://gist.github.com/steveloughran/664d30cef20f605f3164ad01f92a458a > > *Build and (unit test) google GCS: * > > > Two test failures, one of which was classpath related and the other just a > new rename contract test needing a new setting in gs.xml to declare what > rename of file over file does. > > Everything is covered in: > https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/569 > > Classpath: assertJ not coming through hadoop-common-test JAR dependencies. > > [ERROR] > com.google.cloud.hadoop.fs.gcs.contract.TestInMemoryGoogleContractRootDirectory.testSimpleRootListing > Time elapsed: 0.093 s <<< ERROR! > java.lang.NoClassDefFoundError: org/assertj/core/api/Assertions > Caused by: java.lang.ClassNotFoundException: org.assertj.core.api.Assertions > > > Happens because I added some tests to the AbstractContractRenameTest which > use assertJ assertions. > Assertj is declared in test scope for hadoop-common test JAR, it's somehow > not propagating. HBoss has the same issue. > > > <dependency> > <groupId>org.assertj</groupId> > <artifactId>assertj-core</artifactId> > <scope>test</scope> > </dependency> > > I really don't understand what is up with our declared exports; just > reviewed them. Nothing we can do about it that I can see. > > Rename test failure is from a new test, with the expected behaviour needing > definition. > > [ERROR] Failures: > [ERROR] > TestInMemoryGoogleContractRename>AbstractContractRenameTest.testRenameFileOverExistingFile:131->Assert.fail:89 > expected > rename(gs://fake-in-memory-test-bucket/contract-test/source-256.txt, > gs://fake-in-memory-test-bucket/contract-test/dest-512.txt) to be rejected > with exception, but got false > > Fix, add "fs.contract.rename-returns-false-if-dest-exists" = true to the > XML contract. > > *Build and test HBoss* > > This is the HBase extension to use ZK to lock file accesses on S3 > > I've broken their build through to changes to the internal S3 client > factory as some new client options were passed down (HADOOP-13551). > That change moved to a new build parameter object, so we can add future > changes without breaking the signature again (mehakmeet already has in > HADOOP-17705) > > https://issues.apache.org/jira/browse/HBASE-25900 > > > Got an initial PR up, though will need to do more so that it will also > compile/test against older builds > https://github.com/apache/hbase-filesystem/pull/23 > > *Build spark, then test S3A Committers through it* > > Build spark-3 against 3.3.1, then ran integration tests against S3 london > > Test are in: https://github.com/hortonworks-spark/cloud-integration.git > > Most of an afternoon was frittered away dealing with the fact that the > spark version move (2.4 to 3.2) meant scalatest upgrade from 3.0 to 3.20 **and > every single test failed to compile because the scalatest project moved the > foundational test suite into a new package**. I had to do that same upgrade > to test my WiP manifest committer (MAPREDUCE-7341) against ABFS, so it's > not completely wasted. It does mean that module and tests is scala 3+ only. > > hadoop-aws and hadoop-azure test suites > > For these I checked out branch-3.3.1, rebuilt it and ran the test suites in > the hadoop-azure and hadoop-aws modules. This triggered a rebuild of those > two modules. > > I did this after doing all the other checks, so everything else was > qualified against the genuine RC3 artifacts. > > > hadoop-aws > > run 1: -Dparallel-tests -DtestsThreadCount=5 -Dmarkers=keep > run 2: -Dparallel-tests -DtestsThreadCount=6 -Ds3guard -Dscale -Ddynamo > > azure > -Dparallel-tests=abfs -DtestsThreadCount=5 -Dscale > > [ERROR] Errors: > [ERROR] > ITestAbfsFileSystemContractSecureDistCp>AbstractContractDistCpTest.testDistCpWithIterator:642 > ยป TestTimedOut > [INFO] > > This is https://issues.apache.org/jira/browse/HADOOP-17628 > > > Overall then: > > 1. All production code good. > 2. some expansion of filesystem tests require some changes downstream, > and the change in the S3Client from HADOOP-13551 the HBoss tests using an > internal interface from compiling. The move to a parameter object (and > documenting this use) is intended to prevent this reoccurring. > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: dev-h...@hadoop.apache.org