[ http://issues.apache.org/jira/browse/HADOOP-50?page=all ]

Mike Cafarella updated HADOOP-50:
---------------------------------

    Attachment: hadoop.50.patch.1


This fixes the multiple-directory storage problem.  It
lazily creates a single level of 512 subdirectories, into which
the blocks are allocated according to the lower 9 bits of the
block id.  If mankind ever needs more blocks than this, it is easy
to add an additional subdir layer and select on the lowest-but-9
bits of the blockid.

This change is backwards-compatible with the previous block
layout.  Old blocks in the single-layer dir will always be kept in
that format; we don't migrate them.  New blocks will always be added 
to the new hierarchy.

If both versions of the storage system are present, we always test
the new one first.  If that fails, we test the old one.  (The new test
should be faster, so we do it first.)

Please let me know if this patch works for you.

> dfs datanode should store blocks in multiple directories
> --------------------------------------------------------
>
>          Key: HADOOP-50
>          URL: http://issues.apache.org/jira/browse/HADOOP-50
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Mike Cafarella
>      Fix For: 0.3
>  Attachments: hadoop.50.patch.1
>
> The datanode currently stores all file blocks in a single directory.  With 
> 32MB blocks and terabyte filesystems, this will create too many files in a 
> single directory for many filesystems.  Thus blocks should be stored in 
> multiple directories, perhaps even a shallow hierarchy.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to