Author: hairong
Date: Mon Jul 7 12:14:06 2008
New Revision: 674598
URL: http://svn.apache.org/viewvc?rev=674598&view=rev
Log:
Merge -r 674587:674588 from trunk to main to move the change log of HADOOP-3688
into the release 0.18.0 section
Modified:
hadoop/core/branches/branch-0.18/CHANGES.txt
hadoop/core/branches/branch-0.18/docs/changes.html
hadoop/core/branches/branch-0.18/docs/hdfs_design.html
hadoop/core/branches/branch-0.18/docs/hdfs_design.pdf
hadoop/core/branches/branch-0.18/docs/hdfs_quota_admin_guide.html
hadoop/core/branches/branch-0.18/docs/hdfs_quota_admin_guide.pdf
hadoop/core/branches/branch-0.18/docs/hdfs_user_guide.html
hadoop/core/branches/branch-0.18/docs/hdfs_user_guide.pdf
hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hdfs_design.xml
hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hdfs_quota_admin_guide.xml
hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml
Modified: hadoop/core/branches/branch-0.18/CHANGES.txt
URL:
http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/CHANGES.txt?rev=674598&r1=674597&r2=674598&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.18/CHANGES.txt (original)
+++ hadoop/core/branches/branch-0.18/CHANGES.txt Mon Jul 7 12:14:06 2008
@@ -320,6 +320,8 @@
HADOOP-3100. Develop tests to test the DFS command line interface. (mukund)
+ HADOOP-3688. Fix up HDFS docs. (Robert Chansler via hairong)
+
OPTIMIZATIONS
HADOOP-3274. The default constructor of BytesWritable creates empty
Modified: hadoop/core/branches/branch-0.18/docs/changes.html
URL:
http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/docs/changes.html?rev=674598&r1=674597&r2=674598&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.18/docs/changes.html (original)
+++ hadoop/core/branches/branch-0.18/docs/changes.html Mon Jul 7 12:14:06 2008
@@ -187,7 +187,7 @@
</ol>
</li>
<li><a
href="javascript:toggleList('release_0.18.0_-_unreleased_._improvements_')">
IMPROVEMENTS
-</a> (45)
+</a> (46)
<ol id="release_0.18.0_-_unreleased_._improvements_">
<li><a
href="http://issues.apache.org/jira/browse/HADOOP-2928">HADOOP-2928</a>. Remove
deprecated FileSystem.getContentLength().<br />(Lohit Vjayarenu via
rangadi)</li>
<li><a
href="http://issues.apache.org/jira/browse/HADOOP-3130">HADOOP-3130</a>. Make
the connect timeout smaller for getFile.<br />(Amar Ramesh Kamat via ddas)</li>
@@ -278,6 +278,7 @@
<li><a
href="http://issues.apache.org/jira/browse/HADOOP-3606">HADOOP-3606</a>.
Updates the Streaming doc.<br />(Amareshwari Sriramadasu via ddas)</li>
<li><a
href="http://issues.apache.org/jira/browse/HADOOP-3532">HADOOP-3532</a>. Add
jdiff reports to the build scripts.<br />(omalley)</li>
<li><a
href="http://issues.apache.org/jira/browse/HADOOP-3100">HADOOP-3100</a>.
Develop tests to test the DFS command line interface.<br />(mukund)</li>
+ <li><a
href="http://issues.apache.org/jira/browse/HADOOP-3688">HADOOP-3688</a>. Fix up
HDFS docs.<br />(Robert Chansler via hairong)</li>
</ol>
</li>
<li><a
href="javascript:toggleList('release_0.18.0_-_unreleased_._optimizations_')">
OPTIMIZATIONS
@@ -565,7 +566,7 @@
</ol>
</li>
<li><a
href="javascript:toggleList('release_0.17.1_-_unreleased_._bug_fixes_')"> BUG
FIXES
-</a> (4)
+</a> (5)
<ol id="release_0.17.1_-_unreleased_._bug_fixes_">
<li><a
href="http://issues.apache.org/jira/browse/HADOOP-1979">HADOOP-1979</a>. Speed
up fsck by adding a buffered stream.<br />(Lohit
Vijaya Renu via omalley)</li>
@@ -574,6 +575,8 @@
<li><a
href="http://issues.apache.org/jira/browse/HADOOP-3571">HADOOP-3571</a>. Fix
bug in block removal used in lease recovery.<br />(shv)</li>
<li><a
href="http://issues.apache.org/jira/browse/HADOOP-3645">HADOOP-3645</a>.
MetricsTimeVaryingRate returns wrong value for
metric_avg_time.<br />(Lohit Vijayarenu via hairong)</li>
+ <li><a
href="http://issues.apache.org/jira/browse/HADOOP-3633">HADOOP-3633</a>.
Correct exception handling in DataXceiveServer, and throttle
+the number of xceiver threads in a data-node.<br />(shv)</li>
</ol>
</li>
</ul>
Modified: hadoop/core/branches/branch-0.18/docs/hdfs_design.html
URL:
http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/docs/hdfs_design.html?rev=674598&r1=674597&r2=674598&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.18/docs/hdfs_design.html (original)
+++ hadoop/core/branches/branch-0.18/docs/hdfs_design.html Mon Jul 7 12:14:06
2008
@@ -221,7 +221,7 @@
</ul>
</li>
<li>
-<a href="#Namenode+and+Datanodes"> Namenode and Datanodes </a>
+<a href="#NameNode+and+DataNodes"> NameNode and DataNodes </a>
</li>
<li>
<a href="#The+File+System+Namespace"> The File System Namespace </a>
@@ -236,7 +236,7 @@
<a href="#Replica+Selection"> Replica Selection </a>
</li>
<li>
-<a href="#SafeMode"> SafeMode </a>
+<a href="#Safemode"> Safemode </a>
</li>
</ul>
</li>
@@ -284,7 +284,7 @@
<a href="#Accessibility"> Accessibility </a>
<ul class="minitoc">
<li>
-<a href="#DFSShell"> DFSShell </a>
+<a href="#FS+Shell"> FS Shell </a>
</li>
<li>
<a href="#DFSAdmin"> DFSAdmin </a>
@@ -341,7 +341,7 @@
<a name="N1004A"></a><a name="Simple+Coherency+Model"></a>
<h3 class="h4"> Simple Coherency Model </h3>
<p>
- HDFS applications need a write-once-read-many access model for files.
A file once created, written, and closed need not be changed. This assumption
simplifies data coherency issues and enables high throughput data access. A
MapReduce application or a web crawler application fits perfectly with this
model. There is a plan to support appending-writes to files in the future.
+ HDFS applications need a write-once-read-many access model for files.
A file once created, written, and closed need not be changed. This assumption
simplifies data coherency issues and enables high throughput data access. A
Map/Reduce application or a web crawler application fits perfectly with this
model. There is a plan to support appending-writes to files in the future.
</p>
<a name="N10054"></a><a
name="%E2%80%9CMoving+Computation+is+Cheaper+than+Moving+Data%E2%80%9D"></a>
<h3 class="h4"> “Moving Computation is Cheaper than Moving Data”
</h3>
@@ -357,51 +357,51 @@
-<a name="N10069"></a><a name="Namenode+and+Datanodes"></a>
-<h2 class="h3"> Namenode and Datanodes </h2>
+<a name="N10069"></a><a name="NameNode+and+DataNodes"></a>
+<h2 class="h3"> NameNode and DataNodes </h2>
<div class="section">
<p>
- HDFS has a master/slave architecture. An HDFS cluster consists of a
single <em>Namenode</em>, a master server that manages the file system
namespace and regulates access to files by clients. In addition, there are a
number of <em>Datanodes</em>, usually one per node in the cluster, which manage
storage attached to the nodes that they run on. HDFS exposes a file system
namespace and allows user data to be stored in files. Internally, a file is
split into one or more blocks and these blocks are stored in a set of
Datanodes. The Namenode executes file system namespace operations like opening,
closing, and renaming files and directories. It also determines the mapping of
blocks to Datanodes. The Datanodes are responsible for serving read and write
requests from the file system’s clients. The Datanodes also perform block
creation, deletion, and replication upon instruction from the Namenode.
+ HDFS has a master/slave architecture. An HDFS cluster consists of a
single NameNode, a master server that manages the file system namespace and
regulates access to files by clients. In addition, there are a number of
DataNodes, usually one per node in the cluster, which manage storage attached
to the nodes that they run on. HDFS exposes a file system namespace and allows
user data to be stored in files. Internally, a file is split into one or more
blocks and these blocks are stored in a set of DataNodes. The NameNode executes
file system namespace operations like opening, closing, and renaming files and
directories. It also determines the mapping of blocks to DataNodes. The
DataNodes are responsible for serving read and write requests from the file
system’s clients. The DataNodes also perform block creation, deletion,
and replication upon instruction from the NameNode.
</p>
<div id="" style="text-align: center;">
<img id="" class="figure" alt="HDFS Architecture"
src="images/hdfsarchitecture.gif"></div>
<p>
- The Namenode and Datanode are pieces of software designed to run on
commodity machines. These machines typically run a GNU/Linux operating system
(<acronym title="operating system">OS</acronym>). HDFS is built using the Java
language; any machine that supports Java can run the Namenode or the Datanode
software. Usage of the highly portable Java language means that HDFS can be
deployed on a wide range of machines. A typical deployment has a dedicated
machine that runs only the Namenode software. Each of the other machines in the
cluster runs one instance of the Datanode software. The architecture does not
preclude running multiple Datanodes on the same machine but in a real
deployment that is rarely the case.
+ The NameNode and DataNode are pieces of software designed to run on
commodity machines. These machines typically run a GNU/Linux operating system
(<acronym title="operating system">OS</acronym>). HDFS is built using the Java
language; any machine that supports Java can run the NameNode or the DataNode
software. Usage of the highly portable Java language means that HDFS can be
deployed on a wide range of machines. A typical deployment has a dedicated
machine that runs only the NameNode software. Each of the other machines in the
cluster runs one instance of the DataNode software. The architecture does not
preclude running multiple DataNodes on the same machine but in a real
deployment that is rarely the case.
</p>
<p>
- The existence of a single Namenode in a cluster greatly simplifies the
architecture of the system. The Namenode is the arbitrator and repository for
all HDFS metadata. The system is designed in such a way that user <em>data</em>
never flows through the Namenode.
+ The existence of a single NameNode in a cluster greatly simplifies the
architecture of the system. The NameNode is the arbitrator and repository for
all HDFS metadata. The system is designed in such a way that user data never
flows through the NameNode.
</p>
</div>
-<a name="N1008A"></a><a name="The+File+System+Namespace"></a>
+<a name="N10081"></a><a name="The+File+System+Namespace"></a>
<h2 class="h3"> The File System Namespace </h2>
<div class="section">
<p>
HDFS supports a traditional hierarchical file organization. A user or an
application can create directories and store files inside these directories.
The file system namespace hierarchy is similar to most other existing file
systems; one can create and remove files, move a file from one directory to
another, or rename a file. HDFS does not yet implement user quotas or access
permissions. HDFS does not support hard links or soft links. However, the HDFS
architecture does not preclude implementing these features.
</p>
<p>
- The Namenode maintains the file system namespace. Any change to the file
system namespace or its properties is recorded by the Namenode. An application
can specify the number of replicas of a file that should be maintained by HDFS.
The number of copies of a file is called the replication factor of that file.
This information is stored by the Namenode.
+ The NameNode maintains the file system namespace. Any change to the file
system namespace or its properties is recorded by the NameNode. An application
can specify the number of replicas of a file that should be maintained by HDFS.
The number of copies of a file is called the replication factor of that file.
This information is stored by the NameNode.
</p>
</div>
-<a name="N10097"></a><a name="Data+Replication"></a>
+<a name="N1008E"></a><a name="Data+Replication"></a>
<h2 class="h3"> Data Replication </h2>
<div class="section">
<p>
HDFS is designed to reliably store very large files across machines in a
large cluster. It stores each file as a sequence of blocks; all blocks in a
file except the last block are the same size. The blocks of a file are
replicated for fault tolerance. The block size and replication factor are
configurable per file. An application can specify the number of replicas of a
file. The replication factor can be specified at file creation time and can be
changed later. Files in HDFS are write-once and have strictly one writer at any
time.
</p>
<p>
- The Namenode makes all decisions regarding replication of blocks. It
periodically receives a <em>Heartbeat</em> and a <em>Blockreport</em> from each
of the Datanodes in the cluster. Receipt of a Heartbeat implies that the
Datanode is functioning properly. A Blockreport contains a list of all blocks
on a Datanode.
+ The NameNode makes all decisions regarding replication of blocks. It
periodically receives a Heartbeat and a Blockreport from each of the DataNodes
in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning
properly. A Blockreport contains a list of all blocks on a DataNode.
</p>
<div id="" style="text-align: center;">
-<img id="" class="figure" alt="HDFS Datanodes"
src="images/hdfsdatanodes.gif"></div>
-<a name="N100AD"></a><a name="Replica+Placement%3A+The+First+Baby+Steps"></a>
+<img id="" class="figure" alt="HDFS DataNodes"
src="images/hdfsdatanodes.gif"></div>
+<a name="N1009E"></a><a name="Replica+Placement%3A+The+First+Baby+Steps"></a>
<h3 class="h4"> Replica Placement: The First Baby Steps </h3>
<p>
The placement of replicas is critical to HDFS reliability and
performance. Optimizing replica placement distinguishes HDFS from most other
distributed file systems. This is a feature that needs lots of tuning and
experience. The purpose of a rack-aware replica placement policy is to improve
data reliability, availability, and network bandwidth utilization. The current
implementation for the replica placement policy is a first effort in this
direction. The short-term goals of implementing this policy are to validate it
on production systems, learn more about its behavior, and build a foundation to
test and research more sophisticated policies.
@@ -418,76 +418,76 @@
<p>
The current, default replica placement policy described here is a work
in progress.
</p>
-<a name="N100C7"></a><a name="Replica+Selection"></a>
+<a name="N100B8"></a><a name="Replica+Selection"></a>
<h3 class="h4"> Replica Selection </h3>
<p>
To minimize global bandwidth consumption and read latency, HDFS tries
to satisfy a read request from a replica that is closest to the reader. If
there exists a replica on the same rack as the reader node, then that replica
is preferred to satisfy the read request. If angg/ HDFS cluster spans multiple
data centers, then a replica that is resident in the local data center is
preferred over any remote replica.
</p>
-<a name="N100D1"></a><a name="SafeMode"></a>
-<h3 class="h4"> SafeMode </h3>
+<a name="N100C2"></a><a name="Safemode"></a>
+<h3 class="h4"> Safemode </h3>
<p>
- On startup, the Namenode enters a special state called
<em>Safemode</em>. Replication of data blocks does not occur when the Namenode
is in the Safemode state. The Namenode receives Heartbeat and Blockreport
messages from the Datanodes. A Blockreport contains the list of data blocks
that a Datanode is hosting. Each block has a specified minimum number of
replicas. A block is considered <em>safely replicated</em> when the minimum
number of replicas of that data block has checked in with the Namenode. After a
configurable percentage of safely replicated data blocks checks in with the
Namenode (plus an additional 30 seconds), the Namenode exits the Safemode
state. It then determines the list of data blocks (if any) that still have
fewer than the specified number of replicas. The Namenode then replicates these
blocks to other Datanodes.
+ On startup, the NameNode enters a special state called Safemode.
Replication of data blocks does not occur when the NameNode is in the Safemode
state. The NameNode receives Heartbeat and Blockreport messages from the
DataNodes. A Blockreport contains the list of data blocks that a DataNode is
hosting. Each block has a specified minimum number of replicas. A block is
considered safely replicated when the minimum number of replicas of that data
block has checked in with the NameNode. After a configurable percentage of
safely replicated data blocks checks in with the NameNode (plus an additional
30 seconds), the NameNode exits the Safemode state. It then determines the list
of data blocks (if any) that still have fewer than the specified number of
replicas. The NameNode then replicates these blocks to other DataNodes.
</p>
</div>
-<a name="N100E2"></a><a name="The+Persistence+of+File+System+Metadata"></a>
+<a name="N100CD"></a><a name="The+Persistence+of+File+System+Metadata"></a>
<h2 class="h3"> The Persistence of File System Metadata </h2>
<div class="section">
<p>
- The HDFS namespace is stored by the Namenode. The Namenode uses a
transaction log called the <em>EditLog</em> to persistently record every change
that occurs to file system <em>metadata</em>. For example, creating a new file
in HDFS causes the Namenode to insert a record into the EditLog indicating
this. Similarly, changing the replication factor of a file causes a new record
to be inserted into the EditLog. The Namenode uses a file in its <em>local</em>
host OS file system to store the EditLog. The entire file system namespace,
including the mapping of blocks to files and file system properties, is stored
in a file called the <em>FsImage</em>. The FsImage is stored as a file in the
Namenode’s local file system too.
+ The HDFS namespace is stored by the NameNode. The NameNode uses a
transaction log called the EditLog to persistently record every change that
occurs to file system metadata. For example, creating a new file in HDFS causes
the NameNode to insert a record into the EditLog indicating this. Similarly,
changing the replication factor of a file causes a new record to be inserted
into the EditLog. The NameNode uses a file in its local host OS file system to
store the EditLog. The entire file system namespace, including the mapping of
blocks to files and file system properties, is stored in a file called the
FsImage. The FsImage is stored as a file in the NameNode’s local file
system too.
</p>
<p>
- The Namenode keeps an image of the entire file system namespace and
file <em>Blockmap</em> in memory. This key metadata item is designed to be
compact, such that a Namenode with 4 GB of RAM is plenty to support a huge
number of files and directories. When the Namenode starts up, it reads the
FsImage and EditLog from disk, applies all the transactions from the EditLog to
the in-memory representation of the FsImage, and flushes out this new version
into a new FsImage on disk. It can then truncate the old EditLog because its
transactions have been applied to the persistent FsImage. This process is
called a <em>checkpoint</em>. In the current implementation, a checkpoint only
occurs when the Namenode starts up. Work is in progress to support periodic
checkpointing in the near future.
+ The NameNode keeps an image of the entire file system namespace and
file Blockmap in memory. This key metadata item is designed to be compact, such
that a NameNode with 4 GB of RAM is plenty to support a huge number of files
and directories. When the NameNode starts up, it reads the FsImage and EditLog
from disk, applies all the transactions from the EditLog to the in-memory
representation of the FsImage, and flushes out this new version into a new
FsImage on disk. It can then truncate the old EditLog because its transactions
have been applied to the persistent FsImage. This process is called a
checkpoint. In the current implementation, a checkpoint only occurs when the
NameNode starts up. Work is in progress to support periodic checkpointing in
the near future.
</p>
<p>
- The Datanode stores HDFS data in files in its local file system. The
Datanode has no knowledge about HDFS files. It stores each block of HDFS data
in a separate file in its local file system. The Datanode does not create all
files in the same directory. Instead, it uses a heuristic to determine the
optimal number of files per directory and creates subdirectories appropriately.
It is not optimal to create all local files in the same directory because the
local file system might not be able to efficiently support a huge number of
files in a single directory. When a Datanode starts up, it scans through its
local file system, generates a list of all HDFS data blocks that correspond to
each of these local files and sends this report to the Namenode: this is the
Blockreport.
+ The DataNode stores HDFS data in files in its local file system. The
DataNode has no knowledge about HDFS files. It stores each block of HDFS data
in a separate file in its local file system. The DataNode does not create all
files in the same directory. Instead, it uses a heuristic to determine the
optimal number of files per directory and creates subdirectories appropriately.
It is not optimal to create all local files in the same directory because the
local file system might not be able to efficiently support a huge number of
files in a single directory. When a DataNode starts up, it scans through its
local file system, generates a list of all HDFS data blocks that correspond to
each of these local files and sends this report to the NameNode: this is the
Blockreport.
</p>
</div>
-<a name="N10104"></a><a name="The+Communication+Protocols"></a>
+<a name="N100DD"></a><a name="The+Communication+Protocols"></a>
<h2 class="h3"> The Communication Protocols </h2>
<div class="section">
<p>
- All HDFS communication protocols are layered on top of the TCP/IP
protocol. A client establishes a connection to a configurable <acronym
title="Transmission Control Protocol">TCP</acronym> port on the Namenode
machine. It talks the <em>ClientProtocol</em> with the Namenode. The Datanodes
talk to the Namenode using the <em>DatanodeProtocol</em>. A Remote Procedure
Call (<acronym title="Remote Procedure Call">RPC</acronym>) abstraction wraps
both the ClientProtocol and the DatanodeProtocol. By design, the Namenode never
initiates any RPCs. Instead, it only responds to RPC requests issued by
Datanodes or clients.
+ All HDFS communication protocols are layered on top of the TCP/IP
protocol. A client establishes a connection to a configurable <acronym
title="Transmission Control Protocol">TCP</acronym> port on the NameNode
machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to
the NameNode using the DataNode Protocol. A Remote Procedure Call (<acronym
title="Remote Procedure Call">RPC</acronym>) abstraction wraps both the Client
Protocol and the DataNode Protocol. By design, the NameNode never initiates any
RPCs. Instead, it only responds to RPC requests issued by DataNodes or clients.
</p>
</div>
-<a name="N1011C"></a><a name="Robustness"></a>
+<a name="N100EF"></a><a name="Robustness"></a>
<h2 class="h3"> Robustness </h2>
<div class="section">
<p>
- The primary objective of HDFS is to store data reliably even in the
presence of failures. The three common types of failures are Namenode failures,
Datanode failures and network partitions.
+ The primary objective of HDFS is to store data reliably even in the
presence of failures. The three common types of failures are NameNode failures,
DataNode failures and network partitions.
</p>
-<a name="N10125"></a><a
name="Data+Disk+Failure%2C+Heartbeats+and+Re-Replication"></a>
+<a name="N100F8"></a><a
name="Data+Disk+Failure%2C+Heartbeats+and+Re-Replication"></a>
<h3 class="h4"> Data Disk Failure, Heartbeats and Re-Replication </h3>
<p>
- Each Datanode sends a Heartbeat message to the Namenode periodically.
A network partition can cause a subset of Datanodes to lose connectivity with
the Namenode. The Namenode detects this condition by the absence of a Heartbeat
message. The Namenode marks Datanodes without recent Heartbeats as dead and
does not forward any new <acronym title="Input/Output">IO</acronym> requests to
them. Any data that was registered to a dead Datanode is not available to HDFS
any more. Datanode death may cause the replication factor of some blocks to
fall below their specified value. The Namenode constantly tracks which blocks
need to be replicated and initiates replication whenever necessary. The
necessity for re-replication may arise due to many reasons: a Datanode may
become unavailable, a replica may become corrupted, a hard disk on a Datanode
may fail, or the replication factor of a file may be increased.
+ Each DataNode sends a Heartbeat message to the NameNode periodically.
A network partition can cause a subset of DataNodes to lose connectivity with
the NameNode. The NameNode detects this condition by the absence of a Heartbeat
message. The NameNode marks DataNodes without recent Heartbeats as dead and
does not forward any new <acronym title="Input/Output">IO</acronym> requests to
them. Any data that was registered to a dead DataNode is not available to HDFS
any more. DataNode death may cause the replication factor of some blocks to
fall below their specified value. The NameNode constantly tracks which blocks
need to be replicated and initiates replication whenever necessary. The
necessity for re-replication may arise due to many reasons: a DataNode may
become unavailable, a replica may become corrupted, a hard disk on a DataNode
may fail, or the replication factor of a file may be increased.
</p>
-<a name="N10133"></a><a name="Cluster+Rebalancing"></a>
+<a name="N10106"></a><a name="Cluster+Rebalancing"></a>
<h3 class="h4"> Cluster Rebalancing </h3>
<p>
- The HDFS architecture is compatible with <em>data rebalancing
schemes</em>. A scheme might automatically move data from one Datanode to
another if the free space on a Datanode falls below a certain threshold. In the
event of a sudden high demand for a particular file, a scheme might dynamically
create additional replicas and rebalance other data in the cluster. These types
of data rebalancing schemes are not yet implemented.
+ The HDFS architecture is compatible with data rebalancing schemes. A
scheme might automatically move data from one DataNode to another if the free
space on a DataNode falls below a certain threshold. In the event of a sudden
high demand for a particular file, a scheme might dynamically create additional
replicas and rebalance other data in the cluster. These types of data
rebalancing schemes are not yet implemented.
</p>
-<a name="N10140"></a><a name="Data+Integrity"></a>
+<a name="N10110"></a><a name="Data+Integrity"></a>
<h3 class="h4"> Data Integrity </h3>
<p>
<!-- XXX "checksum checking" sounds funny -->
- It is possible that a block of data fetched from a Datanode arrives
corrupted. This corruption can occur because of faults in a storage device,
network faults, or buggy software. The HDFS client software implements checksum
checking on the contents of HDFS files. When a client creates an HDFS file, it
computes a checksum of each block of the file and stores these checksums in a
separate hidden file in the same HDFS namespace. When a client retrieves file
contents it verifies that the data it received from each Datanode matches the
checksum stored in the associated checksum file. If not, then the client can
opt to retrieve that block from another Datanode that has a replica of that
block.
+ It is possible that a block of data fetched from a DataNode arrives
corrupted. This corruption can occur because of faults in a storage device,
network faults, or buggy software. The HDFS client software implements checksum
checking on the contents of HDFS files. When a client creates an HDFS file, it
computes a checksum of each block of the file and stores these checksums in a
separate hidden file in the same HDFS namespace. When a client retrieves file
contents it verifies that the data it received from each DataNode matches the
checksum stored in the associated checksum file. If not, then the client can
opt to retrieve that block from another DataNode that has a replica of that
block.
</p>
-<a name="N1014C"></a><a name="Metadata+Disk+Failure"></a>
+<a name="N1011C"></a><a name="Metadata+Disk+Failure"></a>
<h3 class="h4"> Metadata Disk Failure </h3>
<p>
- The FsImage and the EditLog are central data structures of HDFS. A
corruption of these files can cause the HDFS instance to be non-functional. For
this reason, the Namenode can be configured to support maintaining multiple
copies of the FsImage and EditLog. Any update to either the FsImage or EditLog
causes each of the FsImages and EditLogs to get updated synchronously. This
synchronous updating of multiple copies of the FsImage and EditLog may degrade
the rate of namespace transactions per second that a Namenode can support.
However, this degradation is acceptable because even though HDFS applications
are very <em>data</em> intensive in nature, they are not <em>metadata</em>
intensive. When a Namenode restarts, it selects the latest consistent FsImage
and EditLog to use.
+ The FsImage and the EditLog are central data structures of HDFS. A
corruption of these files can cause the HDFS instance to be non-functional. For
this reason, the NameNode can be configured to support maintaining multiple
copies of the FsImage and EditLog. Any update to either the FsImage or EditLog
causes each of the FsImages and EditLogs to get updated synchronously. This
synchronous updating of multiple copies of the FsImage and EditLog may degrade
the rate of namespace transactions per second that a NameNode can support.
However, this degradation is acceptable because even though HDFS applications
are very data intensive in nature, they are not metadata intensive. When a
NameNode restarts, it selects the latest consistent FsImage and EditLog to use.
</p>
<p>
- The Namenode machine is a single point of failure for an HDFS cluster.
If the Namenode machine fails, manual intervention is necessary. Currently,
automatic restart and failover of the Namenode software to another machine is
not supported.
+ The NameNode machine is a single point of failure for an HDFS cluster.
If the NameNode machine fails, manual intervention is necessary. Currently,
automatic restart and failover of the NameNode software to another machine is
not supported.
</p>
-<a name="N1015F"></a><a name="Snapshots"></a>
+<a name="N10129"></a><a name="Snapshots"></a>
<h3 class="h4"> Snapshots </h3>
<p>
Snapshots support storing a copy of data at a particular instant of
time. One usage of the snapshot feature may be to roll back a corrupted HDFS
instance to a previously known good point in time. HDFS does not currently
support snapshots but will in a future release.
@@ -496,40 +496,40 @@
-<a name="N1016A"></a><a name="Data+Organization"></a>
+<a name="N10134"></a><a name="Data+Organization"></a>
<h2 class="h3"> Data Organization </h2>
<div class="section">
-<a name="N10172"></a><a name="Data+Blocks"></a>
+<a name="N1013C"></a><a name="Data+Blocks"></a>
<h3 class="h4"> Data Blocks </h3>
<p>
- HDFS is designed to support very large files. Applications that are
compatible with HDFS are those that deal with large data sets. These
applications write their data only once but they read it one or more times and
require these reads to be satisfied at streaming speeds. HDFS supports
write-once-read-many semantics on files. A typical block size used by HDFS is
64 MB. Thus, an HDFS file is chopped up into 64 MB chunks, and if possible,
each chunk will reside on a different Datanode.
+ HDFS is designed to support very large files. Applications that are
compatible with HDFS are those that deal with large data sets. These
applications write their data only once but they read it one or more times and
require these reads to be satisfied at streaming speeds. HDFS supports
write-once-read-many semantics on files. A typical block size used by HDFS is
64 MB. Thus, an HDFS file is chopped up into 64 MB chunks, and if possible,
each chunk will reside on a different DataNode.
</p>
-<a name="N1017C"></a><a name="Staging"></a>
+<a name="N10146"></a><a name="Staging"></a>
<h3 class="h4"> Staging </h3>
<p>
- A client request to create a file does not reach the Namenode
immediately. In fact, initially the HDFS client caches the file data into a
temporary local file. Application writes are transparently redirected to this
temporary local file. When the local file accumulates data worth over one HDFS
block size, the client contacts the Namenode. The Namenode inserts the file
name into the file system hierarchy and allocates a data block for it. The
Namenode responds to the client request with the identity of the Datanode and
the destination data block. Then the client flushes the block of data from the
local temporary file to the specified Datanode. When a file is closed, the
remaining un-flushed data in the temporary local file is transferred to the
Datanode. The client then tells the Namenode that the file is closed. At this
point, the Namenode commits the file creation operation into a persistent
store. If the Namenode dies before the file is closed, the file is lost.
+ A client request to create a file does not reach the NameNode
immediately. In fact, initially the HDFS client caches the file data into a
temporary local file. Application writes are transparently redirected to this
temporary local file. When the local file accumulates data worth over one HDFS
block size, the client contacts the NameNode. The NameNode inserts the file
name into the file system hierarchy and allocates a data block for it. The
NameNode responds to the client request with the identity of the DataNode and
the destination data block. Then the client flushes the block of data from the
local temporary file to the specified DataNode. When a file is closed, the
remaining un-flushed data in the temporary local file is transferred to the
DataNode. The client then tells the NameNode that the file is closed. At this
point, the NameNode commits the file creation operation into a persistent
store. If the NameNode dies before the file is closed, the file is lost.
</p>
<p>
The above approach has been adopted after careful consideration of
target applications that run on HDFS. These applications need streaming writes
to files. If a client writes to a remote file directly without any client side
buffering, the network speed and the congestion in the network impacts
throughput considerably. This approach is not without precedent. Earlier
distributed file systems, e.g. <acronym title="Andrew File
System">AFS</acronym>, have used client side caching to improve performance. A
POSIX requirement has been relaxed to achieve higher performance of data
uploads.
</p>
-<a name="N1018F"></a><a name="Replication+Pipelining"></a>
+<a name="N10159"></a><a name="Replication+Pipelining"></a>
<h3 class="h4"> Replication Pipelining </h3>
<p>
- When a client is writing data to an HDFS file, its data is first
written to a local file as explained in the previous section. Suppose the HDFS
file has a replication factor of three. When the local file accumulates a full
block of user data, the client retrieves a list of Datanodes from the Namenode.
This list contains the Datanodes that will host a replica of that block. The
client then flushes the data block to the first Datanode. The first Datanode
starts receiving the data in small portions (4 KB), writes each portion to its
local repository and transfers that portion to the second Datanode in the list.
The second Datanode, in turn starts receiving each portion of the data block,
writes that portion to its repository and then flushes that portion to the
third Datanode. Finally, the third Datanode writes the data to its local
repository. Thus, a Datanode can be receiving data from the previous one in the
pipeline and at the same time forwarding data to the next o
ne in the pipeline. Thus, the data is pipelined from one Datanode to the next.
+ When a client is writing data to an HDFS file, its data is first
written to a local file as explained in the previous section. Suppose the HDFS
file has a replication factor of three. When the local file accumulates a full
block of user data, the client retrieves a list of DataNodes from the NameNode.
This list contains the DataNodes that will host a replica of that block. The
client then flushes the data block to the first DataNode. The first DataNode
starts receiving the data in small portions (4 KB), writes each portion to its
local repository and transfers that portion to the second DataNode in the list.
The second DataNode, in turn starts receiving each portion of the data block,
writes that portion to its repository and then flushes that portion to the
third DataNode. Finally, the third DataNode writes the data to its local
repository. Thus, a DataNode can be receiving data from the previous one in the
pipeline and at the same time forwarding data to the next o
ne in the pipeline. Thus, the data is pipelined from one DataNode to the next.
</p>
</div>
-<a name="N1019A"></a><a name="Accessibility"></a>
+<a name="N10164"></a><a name="Accessibility"></a>
<h2 class="h3"> Accessibility </h2>
<div class="section">
<p>
HDFS can be accessed from applications in many different ways. Natively,
HDFS provides a <a href="http://hadoop.apache.org/core/docs/current/api/">Java
API</a> for applications to use. A C language wrapper for this Java API is also
available. In addition, an HTTP browser can also be used to browse the files of
an HDFS instance. Work is in progress to expose HDFS through the <acronym
title="Web-based Distributed Authoring and Versioning">WebDAV</acronym>
protocol.
</p>
-<a name="N101AF"></a><a name="DFSShell"></a>
-<h3 class="h4"> DFSShell </h3>
+<a name="N10179"></a><a name="FS+Shell"></a>
+<h3 class="h4"> FS Shell </h3>
<p>
- HDFS allows user data to be organized in the form of files and
directories. It provides a commandline interface called <em>DFSShell</em> that
lets a user interact with the data in HDFS. The syntax of this command set is
similar to other shells (e.g. bash, csh) that users are already familiar with.
Here are some sample action/command pairs:
+ HDFS allows user data to be organized in the form of files and
directories. It provides a commandline interface called FS shell that lets a
user interact with the data in HDFS. The syntax of this command set is similar
to other shells (e.g. bash, csh) that users are already familiar with. Here are
some sample action/command pairs:
</p>
<table class="ForrestTable" cellspacing="1" cellpadding="4">
@@ -559,12 +559,12 @@
</table>
<p>
- DFSShell is targeted for applications that need a scripting language
to interact with the stored data.
+ FS shell is targeted for applications that need a scripting language
to interact with the stored data.
</p>
-<a name="N10207"></a><a name="DFSAdmin"></a>
+<a name="N101CE"></a><a name="DFSAdmin"></a>
<h3 class="h4"> DFSAdmin </h3>
<p>
- The <em>DFSAdmin</em> command set is used for administering an HDFS
cluster. These are commands that are used only by an HDFS administrator. Here
are some sample action/command pairs:
+ The DFSAdmin command set is used for administering an HDFS cluster.
These are commands that are used only by an HDFS administrator. Here are some
sample action/command pairs:
</p>
<table class="ForrestTable" cellspacing="1" cellpadding="4">
@@ -576,24 +576,24 @@
<tr>
-<td colspan="1" rowspan="1"> Put a cluster in SafeMode </td> <td colspan="1"
rowspan="1"> <span class="codefrag">bin/hadoop dfsadmin -safemode enter</span>
</td>
+<td colspan="1" rowspan="1"> Put the cluster in Safemode </td> <td colspan="1"
rowspan="1"> <span class="codefrag">bin/hadoop dfsadmin -safemode enter</span>
</td>
</tr>
<tr>
-<td colspan="1" rowspan="1"> Generate a list of Datanodes </td> <td
colspan="1" rowspan="1"> <span class="codefrag">bin/hadoop dfsadmin
-report</span> </td>
+<td colspan="1" rowspan="1"> Generate a list of DataNodes </td> <td
colspan="1" rowspan="1"> <span class="codefrag">bin/hadoop dfsadmin
-report</span> </td>
</tr>
<tr>
-<td colspan="1" rowspan="1"> Decommission Datanode <span
class="codefrag">datanodename</span> </td><td colspan="1" rowspan="1"> <span
class="codefrag">bin/hadoop dfsadmin -decommission datanodename</span> </td>
+<td colspan="1" rowspan="1"> Decommission DataNode <span
class="codefrag">datanodename</span> </td><td colspan="1" rowspan="1"> <span
class="codefrag">bin/hadoop dfsadmin -decommission datanodename</span> </td>
</tr>
</table>
-<a name="N10255"></a><a name="Browser+Interface"></a>
+<a name="N10219"></a><a name="Browser+Interface"></a>
<h3 class="h4"> Browser Interface </h3>
<p>
A typical HDFS install configures a web server to expose the HDFS
namespace through a configurable TCP port. This allows a user to navigate the
HDFS namespace and view the contents of its files using a web browser.
@@ -601,27 +601,27 @@
</div>
-<a name="N10260"></a><a name="Space+Reclamation"></a>
+<a name="N10224"></a><a name="Space+Reclamation"></a>
<h2 class="h3"> Space Reclamation </h2>
<div class="section">
-<a name="N10266"></a><a name="File+Deletes+and+Undeletes"></a>
+<a name="N1022A"></a><a name="File+Deletes+and+Undeletes"></a>
<h3 class="h4"> File Deletes and Undeletes </h3>
<p>
- When a file is deleted by a user or an application, it is not
immediately removed from HDFS. Instead, HDFS first renames it to a file in the
<span class="codefrag">/trash</span> directory. The file can be restored
quickly as long as it remains in <span class="codefrag">/trash</span>. A file
remains in <span class="codefrag">/trash</span> for a configurable amount of
time. After the expiry of its life in <span class="codefrag">/trash</span>, the
Namenode deletes the file from the HDFS namespace. The deletion of a file
causes the blocks associated with the file to be freed. Note that there could
be an appreciable time delay between the time a file is deleted by a user and
the time of the corresponding increase in free space in HDFS.
+ When a file is deleted by a user or an application, it is not
immediately removed from HDFS. Instead, HDFS first renames it to a file in the
<span class="codefrag">/trash</span> directory. The file can be restored
quickly as long as it remains in <span class="codefrag">/trash</span>. A file
remains in <span class="codefrag">/trash</span> for a configurable amount of
time. After the expiry of its life in <span class="codefrag">/trash</span>, the
NameNode deletes the file from the HDFS namespace. The deletion of a file
causes the blocks associated with the file to be freed. Note that there could
be an appreciable time delay between the time a file is deleted by a user and
the time of the corresponding increase in free space in HDFS.
</p>
<p>
A user can Undelete a file after deleting it as long as it remains in
the <span class="codefrag">/trash</span> directory. If a user wants to undelete
a file that he/she has deleted, he/she can navigate the <span
class="codefrag">/trash</span> directory and retrieve the file. The <span
class="codefrag">/trash</span> directory contains only the latest copy of the
file that was deleted. The <span class="codefrag">/trash</span> directory is
just like any other directory with one special feature: HDFS applies specified
policies to automatically delete files from this directory. The current default
policy is to delete files from <span class="codefrag">/trash</span> that are
more than 6 hours old. In the future, this policy will be configurable through
a well defined interface.
</p>
-<a name="N1028E"></a><a name="Decrease+Replication+Factor"></a>
+<a name="N10252"></a><a name="Decrease+Replication+Factor"></a>
<h3 class="h4"> Decrease Replication Factor </h3>
<p>
- When the replication factor of a file is reduced, the Namenode selects
excess replicas that can be deleted. The next Heartbeat transfers this
information to the Datanode. The Datanode then removes the corresponding blocks
and the corresponding free space appears in the cluster. Once again, there
might be a time delay between the completion of the <span
class="codefrag">setReplication</span> API call and the appearance of free
space in the cluster.
+ When the replication factor of a file is reduced, the NameNode selects
excess replicas that can be deleted. The next Heartbeat transfers this
information to the DataNode. The DataNode then removes the corresponding blocks
and the corresponding free space appears in the cluster. Once again, there
might be a time delay between the completion of the <span
class="codefrag">setReplication</span> API call and the appearance of free
space in the cluster.
</p>
</div>
-<a name="N1029C"></a><a name="References"></a>
+<a name="N10260"></a><a name="References"></a>
<h2 class="h3"> References </h2>
<div class="section">
<p>