[ https://issues.apache.org/jira/browse/HADOOP-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106105#comment-14106105 ]
Karthik Kambatla commented on HADOOP-10986: ------------------------------------------- Thanks for the investigation, Alejandro. I see why this happened - the script (create-release.sh) was doing mvn install with -Pdocs option, on top of which I copied the mvn site output as well. We can fix the script to not create javadocs during install, and use what we get from site. I tried this locally and the binary tarball is much smaller. I propose we handle this also under HADOOP-10956, and close this as a duplicate. > hadoop tarball is twice as big as prev. version and 6 times as big unpacked > --------------------------------------------------------------------------- > > Key: HADOOP-10986 > URL: https://issues.apache.org/jira/browse/HADOOP-10986 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.5.0 > Reporter: André Kelpe > Assignee: Karthik Kambatla > Priority: Blocker > > I noticed that the binary tarball for 2.5.0 is almost 300MB, while 2.4.1 is > only 132MB. Unpacking the latest tarball gives me 1.8 GB of stuff, with the > majority in the "share" directory. > > {code} > $ cd hadoop-2.4.1 > $ du -sh * > 364K bin > 356K etc > 100K include > 2,3M lib > 128K libexec > 24K LICENSE.txt > 12K NOTICE.txt > 12K README.txt > 336K sbin > 280M share > {code} > {code} > $ cd hadoop-2.5.0 > $ du -sh * > 512K bin > 332K etc > 100K include > 4,6M lib > 128K libexec > 336K sbin > 1,8G share > {code} > I also saw some warnings from tar while unpacking: > {code} > $ tar xf hadoop-2.5.0.tar.gz > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)