[jira] Commented: (HBASE-911) Minimize filesystem footprint

stack (JIRA) Fri, 03 Oct 2008 09:53:05 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636673#action_12636673
 ]


stack commented on HBASE-911:
-----------------------------

I took a look.  Blocks are not all 64MB in size.  Last block in a file is the 
size of the files tail.

I set up a clean hdfs on four nodes.  I took the size of the dfs directory:

{code}
[branch-0.18]$ for i in `cat conf/slaves`; do ssh $i "du -sb  
/bfd/hadoop-stack/dfs"; done
37527   /bfd/hadoop-stack/dfs
20795   /bfd/hadoop-stack/dfs
20795   /bfd/hadoop-stack/dfs
20794   /bfd/hadoop-stack/dfs
{code}

Next I uploaded a file of 98 bytes up into hdfs:

{code}
[branch-0.18]$ ls -la /tmp/xxxx.txt
-rw-r--r-- 1 stack powerset 98 Sep 26 23:54 /tmp/xxxx.txt
[EMAIL PROTECTED] branch-0.18]$ ./bin/hadoop fs -put /tmp/xxxx.txt /
{code}

Then I did a new listing:
{code}
[branch-0.18]$ for i in `cat conf/slaves`; do ssh $i "du -sb  
/bfd/hadoop-stack/dfs"; done
37840   /bfd/hadoop-stack/dfs
20904   /bfd/hadoop-stack/dfs
20904   /bfd/hadoop-stack/dfs
20794   /bfd/hadoop-stack/dfs
{code}

Sizes changed in three locations, one per replication.

Listing the dfs data directory on one of the replicas, I see a block of size 98 
bytes and some accompanying metadata:

{code}
[branch-0.18]$ ls -la /bfd/hadoop-stack/dfs/data/current/
total 20
drwxr-sr-x 2 stack powerset 4096 Oct  3 16:40 .
drwxr-sr-x 5 stack powerset 4096 Oct  3 16:39 ..
-rw-r--r-- 1 stack powerset  158 Oct  3 16:39 VERSION
-rw-r--r-- 1 stack powerset   98 Oct  3 16:40 blk_-343955609951300745
-rw-r--r-- 1 stack powerset   11 Oct  3 16:40 blk_-343955609951300745_1001.meta
-rw-r--r-- 1 stack powerset    0 Oct  3 16:39 dncp_block_verification.log.curr
{code}

> Minimize filesystem footprint
> -----------------------------
>
>                 Key: HBASE-911
>                 URL: https://issues.apache.org/jira/browse/HBASE-911
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>
> This issue is about looking into how much space in filesystem hbases uses.  
> Daniel Ploeg suggests that hbase is profligate in its use of space in hdfs.   
> Given that block sizes by default are 64MB, and that every time hbase writes 
> a store file that its accompanied by an index file and a very small metadata 
> file, thats 3*64MB even if the file is empty (TODO: Prove this).  The 
> situation is aggrevated by the fact that hbase does a flush of whatever is in 
> memory every 30 minutes to minimize loss in the absence of appends; this 
> latter action makes for lots of small files.
> The solution to the above is implement append so optional flush is not 
> necessary and a file format that aggregates info, index and data all in the 
> one file.   Short-term, we should set block size on the info/metadata file 
> down to 4k or some such small size and look into doing likewise for the 
> mapfile index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-911) Minimize filesystem footprint

Reply via email to