[ 
https://issues.apache.org/jira/browse/HDFS-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Crobak updated HDFS-1612:
-----------------------------

    Attachment: HDFS-1612.patch

First pass at an update to the hdfs_design doc.

Changes:
* "scale to hundreds of nodes" -> "scale to thousands of nodes"
* Changes to reflect append and hflush features.
* Mention support for user quotas.
* fixed a typo -- stray gg/
* Mention checkpoint and backup nodes.

There are a few other things that might be updated:
* "HDFS does not currently support snapshots but will in a future release" -- 
but HDFS-233 hasn't been updated since June, 2010.
* "Work is in progress to expose HDFS through the WebDAV protocol" -- either 
reference https://github.com/huyphan/HDFS-over-Webdav or remove this? HDFS-225 
hasn't been updated since August 2009.
* It's unclear to me if the rebalancing section needs to be updated. The hadoop 
balancer is a manual process, AFAIK, so what is there is technically accurate.

> HDFS Design Documentation is outdated
> -------------------------------------
>
>                 Key: HDFS-1612
>                 URL: https://issues.apache.org/jira/browse/HDFS-1612
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.20.2, 0.21.0
>         Environment: 
> http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#The+Persistence+of+File+System+Metadata
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#The+Persistence+of+File+System+Metadata
>            Reporter: Joe Crobak
>            Priority: Minor
>         Attachments: HDFS-1612.patch
>
>
> I was trying to discover details about the Secondary NameNode, and came 
> across the description below in the HDFS design doc.
> {quote}
> The NameNode keeps an image of the entire file system namespace and file 
> Blockmap in memory. This key metadata item is designed to be compact, such 
> that a NameNode with 4 GB of RAM is plenty to support a huge number of files 
> and directories. When the NameNode starts up, it reads the FsImage and 
> EditLog from disk, applies all the transactions from the EditLog to the 
> in-memory representation of the FsImage, and flushes out this new version 
> into a new FsImage on disk. It can then truncate the old EditLog because its 
> transactions have been applied to the persistent FsImage. This process is 
> called a checkpoint. *In the current implementation, a checkpoint only occurs 
> when the NameNode starts up. Work is in progress to support periodic 
> checkpointing in the near future.*
> {quote}
> (emphasis mine).
> Note that this directly conflicts with information in the hdfs user guide, 
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs_user_guide.html#Secondary+NameNode
> and 
> http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node
> I haven't done a thorough audit of that doc-- I only noticed the above 
> inaccuracy.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to