[ https://issues.apache.org/jira/browse/HBASE-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636673#action_12636673 ]
stack commented on HBASE-911: ----------------------------- I took a look. Blocks are not all 64MB in size. Last block in a file is the size of the files tail. I set up a clean hdfs on four nodes. I took the size of the dfs directory: {code} [branch-0.18]$ for i in `cat conf/slaves`; do ssh $i "du -sb /bfd/hadoop-stack/dfs"; done 37527 /bfd/hadoop-stack/dfs 20795 /bfd/hadoop-stack/dfs 20795 /bfd/hadoop-stack/dfs 20794 /bfd/hadoop-stack/dfs {code} Next I uploaded a file of 98 bytes up into hdfs: {code} [branch-0.18]$ ls -la /tmp/xxxx.txt -rw-r--r-- 1 stack powerset 98 Sep 26 23:54 /tmp/xxxx.txt [EMAIL PROTECTED] branch-0.18]$ ./bin/hadoop fs -put /tmp/xxxx.txt / {code} Then I did a new listing: {code} [branch-0.18]$ for i in `cat conf/slaves`; do ssh $i "du -sb /bfd/hadoop-stack/dfs"; done 37840 /bfd/hadoop-stack/dfs 20904 /bfd/hadoop-stack/dfs 20904 /bfd/hadoop-stack/dfs 20794 /bfd/hadoop-stack/dfs {code} Sizes changed in three locations, one per replication. Listing the dfs data directory on one of the replicas, I see a block of size 98 bytes and some accompanying metadata: {code} [branch-0.18]$ ls -la /bfd/hadoop-stack/dfs/data/current/ total 20 drwxr-sr-x 2 stack powerset 4096 Oct 3 16:40 . drwxr-sr-x 5 stack powerset 4096 Oct 3 16:39 .. -rw-r--r-- 1 stack powerset 158 Oct 3 16:39 VERSION -rw-r--r-- 1 stack powerset 98 Oct 3 16:40 blk_-343955609951300745 -rw-r--r-- 1 stack powerset 11 Oct 3 16:40 blk_-343955609951300745_1001.meta -rw-r--r-- 1 stack powerset 0 Oct 3 16:39 dncp_block_verification.log.curr {code} > Minimize filesystem footprint > ----------------------------- > > Key: HBASE-911 > URL: https://issues.apache.org/jira/browse/HBASE-911 > Project: Hadoop HBase > Issue Type: Improvement > Reporter: stack > > This issue is about looking into how much space in filesystem hbases uses. > Daniel Ploeg suggests that hbase is profligate in its use of space in hdfs. > Given that block sizes by default are 64MB, and that every time hbase writes > a store file that its accompanied by an index file and a very small metadata > file, thats 3*64MB even if the file is empty (TODO: Prove this). The > situation is aggrevated by the fact that hbase does a flush of whatever is in > memory every 30 minutes to minimize loss in the absence of appends; this > latter action makes for lots of small files. > The solution to the above is implement append so optional flush is not > necessary and a file format that aggregates info, index and data all in the > one file. Short-term, we should set block size on the info/metadata file > down to 4k or some such small size and look into doing likewise for the > mapfile index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.