Thanks Ayush for finding this issue. Looks like there is a problem with the ARM binaries. The reason for that could be that the create-release script doesn't work in ARM docker container but works manually after logging inside the container. So I was creating the tars manually. Seems like it didn't work properly. There are two yarn issues because of which I had to create this manually. https://issues.apache.org/jira/browse/YARN-11712 https://issues.apache.org/jira/browse/YARN-11713 So we have decided to remove the aarch* binaries from the 3.4.1 release and target to fix that in 3.4.2.
On Fri, Oct 18, 2024 at 1:41 PM Ayush Saxena <ayush...@gmail.com> wrote: > Thanx Mukund for sharing the details, I might have missed that > discussion. Do mention the purpose on the website once we upload the > lean jar, so that relevant people can use it. > > I am holding my vote due to [3], that in general looks blocker to me, > If I haven't messed up. Would be great if someone can help validate > that I believe is the cause for [2] > > * Built from source > * Verified checksums > * Verified signatures > * Validated no code diff b/w the git tag & src tar > * Verified the output of `hadoop version` [1] > * Checked the NOTICE & LICENSE file > * Ran some basic HDFS shell commands > * Ran some Erasure Coding related commands > * Ran example jobs (TeraGen, TeraSort, WordCount) [2] > * Browsed through the UI (NN, DN, RM, NM & JHS) > * Skimmed over the contents of maven artifacts, Release Notes & ChangeLog > * Discrepancy in contents of aarch64 tar & x86 tar [3] > > [1] > > The hadoop version output inside the aarch64 tar shows the source > repository as Unknown, which isn't the case in other tar nor was the > case in the last 3.4.0 release. Not blocking, but we should figure out > & fix it in the next release. > > ``` > ayushsaxena@ayushsaxena hadoop-3.4.1 % bin/hadoop version > Hadoop 3.4.1 > Source code repository Unknown -r Unknown > Compiled by mthakur on 2024-10-10T08:24Z > Compiled on platform linux-aarch_64 > Compiled with protoc 3.23.4 > From source with checksum 7292fe9dba5e2e44e3a9f763fce3e680 > ``` > > [2] While trying to start ResourceManager via the ARM binary, it is > failing for me: > ``` > ayushsaxena@ayushsaxena hadoop-3.4.1 % bin/yarn resourcemanager > Error: Could not find or load main class > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager > ``` > It didn't used to happen earlier, but I sorted it out by "export > > HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/Users/ayushsaxena/code/RC/hadoop-3.4.1/hadoop-3.4.1/share/hadoop/tools/lib/*", > Considering I am not pro with YARN, I might have messed up the setup. > My mapred settings have Tez related stuff as well, but the same works > if I use the x86 binary.... > > > [3] Investigating [2], I think the YARN Jars are missing in the > aarch64 share/hadoop/yarn & they are there in the share/hadoop/tools > directory > > in x86 > ``` > ayushsaxena@ayushsaxena hadoop-3.4.1 % ls > x86/hadoop-3.4.1/share/hadoop/yarn > csi hadoop-yarn-server-router-3.4.1.jar > hadoop-yarn-api-3.4.1.jar hadoop-yarn-server-sharedcachemanager-3.4.1.jar > hadoop-yarn-applications-catalog-webapp-3.4.1.war > hadoop-yarn-server-tests-3.4.1.jar > hadoop-yarn-applications-distributedshell-3.4.1.jar > hadoop-yarn-server-timeline-pluginstorage-3.4.1.jar > hadoop-yarn-applications-mawo-core-3.4.1.jar > hadoop-yarn-server-web-proxy-3.4.1.jar > hadoop-yarn-applications-unmanaged-am-launcher-3.4.1.jar > hadoop-yarn-services-api-3.4.1.jar > hadoop-yarn-client-3.4.1.jar hadoop-yarn-services-core-3.4.1.jar > hadoop-yarn-common-3.4.1.jar lib > hadoop-yarn-registry-3.4.1.jar sources > hadoop-yarn-server-applicationhistoryservice-3.4.1.jar test > hadoop-yarn-server-common-3.4.1.jar timelineservice > hadoop-yarn-server-globalpolicygenerator-3.4.1.jar webapps > hadoop-yarn-server-nodemanager-3.4.1.jar yarn-service-examples > hadoop-yarn-server-resourcemanager-3.4.1.jar > ayushsaxena@ayushsaxena hadoop-3.4.1 % > > ``` > > And in aarch64 they aren't there > > ``` > > ayushsaxena@ayushsaxena hadoop-3.4.1 % ls -l > aarch/hadoop-3.4.1/share/hadoop/yarn > > total 0 > drwxr-xr-x 3 ayushsaxena staff 96 Oct 10 15:34 csi > drwxr-xr-x 47 ayushsaxena staff 1504 Oct 10 15:34 lib > drwxr-xr-x 28 ayushsaxena staff 896 Oct 10 15:34 sources > drwxr-xr-x 3 ayushsaxena staff 96 Oct 10 15:34 timelineservice > drwxr-xr-x 3 ayushsaxena staff 96 Oct 10 15:34 webapps > drwxr-xr-x 6 ayushsaxena staff 192 Oct 10 15:34 yarn-service-examples > ayushsaxena@ayushsaxena hadoop-3.4.1 % > ``` > > Can someone help double check this once? I downloaded the 3.4.0 binary > & this is not the case over there. I did a build locally on aarch64 > and there also the jars are present in the share/hadoop/yarn dir > > -Ayush > > On Fri, 18 Oct 2024 at 11:57, Mukund Madhav Thakur <mtha...@cloudera.com> > wrote: > > > > Hi Ayush, > > "lean" tar is a small tar file which doesn't contain the AWS SDK because > the size of AWS SDK is itself 500 MB. > > This can ease usage for non AWS users. Even AWS users can add this jar > explicitly if desired. > > > > This is created using https://github.com/apache/hadoop-release-support > which is the new and easy way to create and validate hadoop releases > > created by Steve during 3.4.0. Also the link to this is present in [1] . > > There was a discussion around this during the RC2 testing. > > > > I will check with Steve on why we don't have the equivalent for aarch64. > > > > [1] https://cwiki.apache.org/confluence/display/HADOOP2/HowToRelease > > > > On Thu, Oct 17, 2024 at 10:00 PM Ayush Saxena <ayush...@gmail.com> > wrote: > >> > >> What exactly is this "lean tar"? I couldn’t find any mention of it in > >> [1], nor did I come across any thread establishing consensus on adding > >> it, as we did for the aarch64 tar. It’s also not clear to me from the > >> create-release script how it is getting generated. So, where is this > >> coming from? Additionally, why don’t we have an equivalent for > >> aarch64? > >> > >> -Ayush > >> > >> [1] https://cwiki.apache.org/confluence/display/HADOOP2/HowToRelease > >> > >> On Thu, 17 Oct 2024 at 13:21, Mukund Madhav Thakur > >> <mtha...@cloudera.com.invalid> wrote: > >> > > >> > Adding my +1 (binding) after all the testing. > >> > Ran a lot of validation workflows from hadoop-release-support all went > >> > fine. > >> > Verified checksum and signatures. > >> > Build orc and reran the failing vectored io tests and it succeeded. > >> > Build parquet and run the hadoop parquet module tests. > >> > @Steve Loughran <ste...@apache.org> Build gcs and ran all the > tests which > >> > went fine. > >> > > >> > Also Ahmar is a committer so his vote can be counted as +1 as well. > >> > > >> > Thanks everyone for testing! Need just one more binding vote for this > to be > >> > released. > >> > > >> > > >> > > >> > > >> > > >> > On Thu, Oct 17, 2024 at 10:01 AM Sneha Vijayarajan < > >> > sneha.vijayara...@gmail.com> wrote: > >> > > >> > > +1 (non-binding) > >> > > > >> > > Ran the ABFS driver tests and all the tests passed. > >> > > > >> > > ============================================================ > >> > > HNS-OAuth > >> > > ============================================================ > >> > > > >> > > [WARNING] Tests run: 154, Failures: 0, Errors: 0, Skipped: 2 > >> > > [WARNING] Tests run: 646, Failures: 0, Errors: 0, Skipped: 82 > >> > > [WARNING] Tests run: 433, Failures: 0, Errors: 0, Skipped: 57 > >> > > > >> > > ============================================================ > >> > > HNS-SharedKey > >> > > ============================================================ > >> > > > >> > > [WARNING] Tests run: 154, Failures: 0, Errors: 0, Skipped: 3 > >> > > [WARNING] Tests run: 646, Failures: 0, Errors: 0, Skipped: 34 > >> > > [WARNING] Tests run: 433, Failures: 0, Errors: 0, Skipped: 44 > >> > > > >> > > ============================================================ > >> > > NonHNS-SharedKey > >> > > ============================================================ > >> > > > >> > > [WARNING] Tests run: 154, Failures: 0, Errors: 0, Skipped: 9 > >> > > [WARNING] Tests run: 630, Failures: 0, Errors: 0, Skipped: 274 > >> > > [WARNING] Tests run: 433, Failures: 0, Errors: 0, Skipped: 47 > >> > > > >> > > ============================================================ > >> > > AppendBlob-HNS-OAuth > >> > > ============================================================ > >> > > > >> > > [WARNING] Tests run: 154, Failures: 0, Errors: 0, Skipped: 2 > >> > > [WARNING] Tests run: 646, Failures: 0, Errors: 0, Skipped: 84 > >> > > [WARNING] Tests run: 433, Failures: 0, Errors: 0, Skipped: 81 > >> > > > >> > > Thanks, > >> > > Sneha Vijayarajan > >> > > > >> > > On Thu, Oct 17, 2024 at 8:57 AM Xiaoqiao He <hexiaoq...@apache.org> > wrote: > >> > > > > >> > > > Update my vote to +1(binding). > >> > > > > >> > > > Addendum: Verified signature was correct for both src/binary/site > >> > > tarball. > >> > > > > >> > > > > >> > > > On Mon, Oct 14, 2024 at 5:26 PM Xiaoqiao He < > hexiaoq...@apache.org> > >> > > wrote: > >> > > > > >> > > > > Thanks Mukund and Steve for driving this release. > >> > > > > > >> > > > > +0. Will +1 when signature check passed. > >> > > > > > >> > > > > [Y] LICENSE files exist and NOTICE is included. > >> > > > > [Y] Rat check is ok. mvn clean apache-rat:check > >> > > > > [Y] Build the source code on Ubuntu and OpenJDK 11 by `mvn clean > >> > > package > >> > > > > -DskipTests -Pnative -Pdist -Dtar`. > >> > > > > [Y] Setup pseudo cluster with HDFS and YARN. > >> > > > > [Y] Run simple FsShell - mkdir/put/get/mv/rm and check the > result. > >> > > > > [Y] Run example mr jobs and check the result - Pi & wordcount. > >> > > > > [Y] Spot-check and run some unit tests. > >> > > > > [Y] Skimmed the Web UI of > NameNode/DataNode/Resourcemanager/NodeManager > >> > > > > etc. > >> > > > > [Y] Skimmed over the contents of site documentation. > >> > > > > [Y] Skimmed over the contents of maven repo. > >> > > > > [N] Verified checksum passed but signature didn't pass. Did you > use > >> > > > > another key to sign? > >> > > > > > >> > > > >> hexiaoqiao@hexiaoqiao-ubuntu:~/Public/hadoop-3.4.1$ gpg > --list-key | > >> > > > >> grep -B2 -A1 "Mukund" > >> > > > >> pub rsa4096 2023-09-28 [SC] [expires: 2027-09-28] > >> > > > >> 41C7034C031FB3AF4BBFB5B89F070EBA4202CEC1 > >> > > > >> uid [ unknown] Mukund Thakur <mtha...@apache.org> > >> > > > >> sub rsa4096 2023-09-28 [E] [expires: 2027-09-28] > >> > > > >> hexiaoqiao@hexiaoqiao-ubuntu:~/Public/hadoop-3.4.1$ gpg > --verify > >> > > > >> hadoop-3.4.1.tar.gz.asc hadoop-3.4.1.tar.gz > >> > > > >> gpg: Signature made 2024年10月10日 星期四 01时10分30秒 CST > >> > > > >> gpg: using RSA key > >> > > 53931DAA708291409958BD474D22BB7D32882201 > >> > > > >> gpg: Can't check signature: No public key > >> > > > > > >> > > > > > >> > > > > Best Regards, > >> > > > > - He Xiaoqiao > >> > > > > > >> > > > > On Fri, Oct 11, 2024 at 1:31 AM Steve Loughran > >> > > <ste...@cloudera.com.invalid> > >> > > > > wrote: > >> > > > > > >> > > > >> Hney, you did the -lean one too! nice > >> > > > >> > >> > > > >> Anyway, yes I'll test. I will try not to find new problems > >> > > > >> > >> > > > >> > >> > > > >> On Thu, 10 Oct 2024 at 12:18, Mukund Madhav Thakur > >> > > > >> <mtha...@cloudera.com.invalid> wrote: > >> > > > >> > >> > > > >> > Apache Hadoop 3.4.1 > >> > > > >> > > >> > > > >> > > >> > > > >> > With help of Steve, I have put together a release candidate > (RC3) > >> > > for > >> > > > >> > Hadoop 3.4.1. > >> > > > >> > > >> > > > >> > > >> > > > >> > What we would like is for anyone who can to verify the > tarballs, > >> > > > >> especially > >> > > > >> > > >> > > > >> > anyone who can try the arm64 binaries as we want to include > them > >> > > too. > >> > > > >> > > >> > > > >> > > >> > > > >> > The RC is available at: > >> > > > >> > > >> > > > >> > > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.1-RC3/ > >> > > > >> > > >> > > > >> > > >> > > > >> > The git tag is release-3.4.1-RC3, commit > >> > > > >> > 4d7825309348956336b8f06a08322b78422849b1 > >> > > > >> > > >> > > > >> > > >> > > > >> > The maven artifacts are staged at > >> > > > >> > > >> > > > >> > > >> > > > https://repository.apache.org/content/repositories/orgapachehadoop-1430 > >> > > > >> > > >> > > > >> > > >> > > > >> > You can find my public key at: > >> > > > >> > > >> > > > >> > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > >> > > > >> > > >> > > > >> > > >> > > > >> > Change log > >> > > > >> > > >> > > > >> > > >> > > > >> > >> > > > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.1-RC3/CHANGELOG.md > >> > > > >> > > >> > > > >> > > >> > > > >> > Release notes > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > >> > > > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.1-RC3/RELEASENOTES.md > >> > > > >> > > >> > > > >> > > >> > > > >> > This is off branch-3.4.1 > >> > > > >> > > >> > > > >> > > >> > > > >> > Key changes include > >> > > > >> > > >> > > > >> > > >> > > > >> > * Bulk Delete API. > >> > > https://issues.apache.org/jira/browse/HADOOP-18679 > >> > > > >> > > >> > > > >> > * Fixes and enhancements in Vectored IO API. > >> > > > >> > > >> > > > >> > * Improvements in Hadoop Azure connector. > >> > > > >> > > >> > > > >> > * Fixes and improvements post upgrade to AWS V2 SDK in > S3AConnector. > >> > > > >> > > >> > > > >> > * This release includes Arm64 binaries. Please can anyone > with > >> > > > >> > > >> > > > >> > compatible systems validate these. > >> > > > >> > > >> > > > >> > > >> > > > >> > Note, because the arm64 binaries are built separately on a > different > >> > > > >> > > >> > > > >> > platform and JVM, their jar files may not match those of the > x86 > >> > > > >> > > >> > > > >> > release -and therefore the maven artifacts. I don't think > this is > >> > > > >> > > >> > > > >> > an issue (the ASF actually releases source tarballs, the > binaries > >> > > are > >> > > > >> > > >> > > > >> > there for help only, though with the maven repo that's a bit > >> > > blurred). > >> > > > >> > > >> > > > >> > > >> > > > >> > The only way to be consistent would actually untar the > x86.tar.gz, > >> > > > >> > > >> > > > >> > overwrite its binaries with the arm stuff, retar, sign and > push out > >> > > > >> > > >> > > > >> > for the vote. Even automating that would be risky. > >> > > > >> > > >> > > > >> > > >> > > > >> > Please try the release and vote. The vote will run for 5 > days. > >> > > > >> > > >> > > > >> > >> > > > > > >> > > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >> >