[
https://issues.apache.org/jira/browse/HDFS-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe Crobak updated HDFS-1612:
-----------------------------
Attachment: HDFS-1612.patch
First pass at an update to the hdfs_design doc.
Changes:
* "scale to hundreds of nodes" -> "scale to thousands of nodes"
* Changes to reflect append and hflush features.
* Mention support for user quotas.
* fixed a typo -- stray gg/
* Mention checkpoint and backup nodes.
There are a few other things that might be updated:
* "HDFS does not currently support snapshots but will in a future release" --
but HDFS-233 hasn't been updated since June, 2010.
* "Work is in progress to expose HDFS through the WebDAV protocol" -- either
reference https://github.com/huyphan/HDFS-over-Webdav or remove this? HDFS-225
hasn't been updated since August 2009.
* It's unclear to me if the rebalancing section needs to be updated. The hadoop
balancer is a manual process, AFAIK, so what is there is technically accurate.
> HDFS Design Documentation is outdated
> -------------------------------------
>
> Key: HDFS-1612
> URL: https://issues.apache.org/jira/browse/HDFS-1612
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.20.2, 0.21.0
> Environment:
> http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#The+Persistence+of+File+System+Metadata
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#The+Persistence+of+File+System+Metadata
> Reporter: Joe Crobak
> Priority: Minor
> Attachments: HDFS-1612.patch
>
>
> I was trying to discover details about the Secondary NameNode, and came
> across the description below in the HDFS design doc.
> {quote}
> The NameNode keeps an image of the entire file system namespace and file
> Blockmap in memory. This key metadata item is designed to be compact, such
> that a NameNode with 4 GB of RAM is plenty to support a huge number of files
> and directories. When the NameNode starts up, it reads the FsImage and
> EditLog from disk, applies all the transactions from the EditLog to the
> in-memory representation of the FsImage, and flushes out this new version
> into a new FsImage on disk. It can then truncate the old EditLog because its
> transactions have been applied to the persistent FsImage. This process is
> called a checkpoint. *In the current implementation, a checkpoint only occurs
> when the NameNode starts up. Work is in progress to support periodic
> checkpointing in the near future.*
> {quote}
> (emphasis mine).
> Note that this directly conflicts with information in the hdfs user guide,
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs_user_guide.html#Secondary+NameNode
> and
> http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node
> I haven't done a thorough audit of that doc-- I only noticed the above
> inaccuracy.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira