[ 
https://issues.apache.org/jira/browse/HADOOP-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105364#comment-14105364
 ] 

Alejandro Abdelnur commented on HADOOP-10986:
---------------------------------------------

It seems the culprit for the significant size increase is in the documentation, 
specifically protobuf javadocs:

{code}
$ cd hadoop-2.5.0/share/doc/hadoop
$ du -m -s *
55      api
119     common
1       css
1       dependency-analysis.html
1       hadoop-annotations
1       hadoop-archives
1       hadoop-assemblies
2       hadoop-auth
1       hadoop-auth-examples
1       hadoop-common-project
1       hadoop-datajoin
1       hadoop-dist
1       hadoop-distcp
1       hadoop-extras
1       hadoop-gridmix
1       hadoop-hdfs-bkjournal
11      hadoop-hdfs-httpfs
1       hadoop-hdfs-nfs
1       hadoop-hdfs-project
1       hadoop-mapreduce
3       hadoop-mapreduce-client
1       hadoop-mapreduce-examples
1       hadoop-maven-plugins
1       hadoop-minicluster
1       hadoop-minikdc
1       hadoop-nfs
1       hadoop-openstack
1       hadoop-pipes
725     hadoop-project-dist
1       hadoop-rumen
1       hadoop-sls
1       hadoop-streaming
1       hadoop-tools
5       hadoop-yarn
1       hadoop-yarn-project
618     hdfs
1       httpfs
1       images
1       index.html
1       mapreduce
1       project-reports.html
1       yarn
{code}

{code}
$ cd hadoop-2.5.0/share/doc/hadoop/
$ du -m -s hdfs/api/src-html/org/apache/hadoop/hdfs/server/namenode/
222     hdfs/api/src-html/org/apache/hadoop/hdfs/server/namenode/
{code}

Also it seems we have duplicate javadocs dirs:

{code}
$ cd hadoop-2.5.0/share/doc/hadoop/
$ find . -name api -type d
./api
./api/org/apache/hadoop/mapreduce/v2/api
./api/org/apache/hadoop/yarn/api
./api/org/apache/hadoop/yarn/client/api
./api/src-html/org/apache/hadoop/yarn/api
./api/src-html/org/apache/hadoop/yarn/client/api
./common/api
./hadoop-project-dist/hadoop-common/api
./hadoop-project-dist/hadoop-hdfs/api
./hdfs/api
{code}


> hadoop tarball is twice as big as prev. version and 6 times as big unpacked
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-10986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10986
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: André Kelpe
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>
> I noticed that the binary tarball for 2.5.0 is almost 300MB, while 2.4.1 is 
> only 132MB. Unpacking the latest tarball gives me 1.8 GB of stuff, with the 
> majority in the "share" directory.
>  
> {code}
> $ cd hadoop-2.4.1
> $ du -sh *
> 364K    bin
> 356K    etc
> 100K    include
> 2,3M    lib
> 128K    libexec
> 24K     LICENSE.txt
> 12K     NOTICE.txt
> 12K     README.txt
> 336K    sbin
> 280M    share
> {code}
> {code}
>  $ cd hadoop-2.5.0 
>  $ du -sh *
> 512K    bin
> 332K    etc
> 100K    include
> 4,6M    lib
> 128K    libexec
> 336K    sbin
> 1,8G    share
> {code}
> I also saw some warnings from tar while unpacking:
> {code}
> $ tar xf hadoop-2.5.0.tar.gz 
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to