[Hadoop Wiki] Update of "HDFS-RAID" by PatrickKling

Apache Wiki Wed, 27 Oct 2010 11:56:41 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "HDFS-RAID" page has been changed by PatrickKling.
The comment on this change is: added sections on BlockFixer and RaidShell.
http://wiki.apache.org/hadoop/HDFS-RAID?action=diff&rev1=2&rev2=3

--------------------------------------------------

   * the DRFS client, which provides application access to the the files in the 
DRFS and transparently recovers any corrupt or missing blocks encountered when 
reading a file,
   * the RaidNode, a daemon that creates and maintains parity files for all 
data files stored in the DRFS,
   * the BlockFixer, which periodically recomputes blocks that have been lost 
or corrupted,
-  * the RaidFsck utility, which allows the administrator to manually trigger 
the recomputation of missing or corrupt blocks and to check for files that have 
become irrecoverably corrupted.
+  * the RaidShell utility, which allows the administrator to manually trigger 
the recomputation of missing or corrupt blocks and to check for files that have 
become irrecoverably corrupted.
  
  === DRFS client ===
  
@@ -33, +33 @@

  
  It is important to note that while the DRFS client recomputes missing blocks 
when reading corrupt files it does not
  insert these missing blocks back into the file system. Instead, it discards 
them once the application request has been fulfilled.
- The BlockFixer daemon and the RaidFsck tool can be used to persistently fix 
bad blocks.
+ The BlockFixer daemon and the RaidShell tool can be used to persistently fix 
bad blocks.
  
  === RaidNode ===
  
@@ -55, +55 @@

  
  (currently under development)
  
- The BlockFixer is a daemon that runs at the RaidNode
+ The BlockFixer is a daemon that runs at the RaidNode and periodically 
inspects the health of the paths for which DRFS is configured.
+ When a file with missing or corrupt blocks is encountered, these blocks are 
recomputed and inserted back into the file system.
  
- === RaidFsck ===
+ There are two implementations of the BlockFixer:
+  * the LocalBlockFixer, which recomputes bad blocks locally at the RaidNode.
+  * the DistributedBlockFixer, which dispatches map reduce jobs to recompute 
blocks.
+ 
+ === RaidShell ===
  
  (currently under development)
+ 
+ The RaidShell is a tool that allows the administrator to maintain and inspect 
a DRFS. It supports commands for manually triggering the 
+ recomputation of bad data blocks and also allows the administrator to display 
a list of irrecoverable files (i.e., files for which too
+ many data or parity blocks have been lost).
+ 
  
  == Using HDFS RAID ==
  
@@ -199, +209 @@

    </property
    }}}
  
- === Administration ===
+ === Running DRFS ===
  
  The DRFS  provides support for administration at runtime without
  any downtime to cluster services.  It is possible to add/delete new paths to 
be raided without
- interrupting any load on the cluster. If you change raid.xml, its contents 
will be
- reload within seconds and the new contents will take effect immediately.
+ interrupting any load on the cluster. Changes to `raid.xml` are detected 
periodically (every few seconds)
+ and new policies are applied immediately.
  
  Designate one machine in your cluster to run the RaidNode software. You can 
run this daemon
  on any machine irrespective of whether that machine is running any other 
hadoop daemon or not.
  You can start the RaidNode by running the following on the selected machine:
+ {{{
  nohup $HADOOP_HOME/bin/hadoop org.apache.hadoop.raid.RaidNode >> 
/xxx/logs/hadoop-root-raidnode-hadoop.xxx.com.log &
+ }}}
  
- Optionally, we provide two scripts to start and stop the RaidNode. Copy the 
scripts
+ We also provide two scripts to start and stop the RaidNode more easily. Copy 
the scripts
- start-raidnode.sh and stop-raidnode.sh to the directory $HADOOP_HOME/bin in 
the machine
+ `start-raidnode.sh` and `stop-raidnode.sh` to the directory 
`$HADOOP_HOME/bin` on the machine
- you would like to deploy the daemon. You can start or stop the RaidNode by 
directly
+ where the RaidNode is to be deployed. You can then start or stop the RaidNode 
by directly
- callying the scripts from that machine. If you want to deploy the RaidNode 
remotely,
+ calling these scripts on that machine. To deploy the RaidNode remotely,
- copy start-raidnode-remote.sh and stop-raidnode-remote.sh to $HADOOP_HOME/bin 
at
+ copy `start-raidnode-remote.sh` and `stop-raidnode-remote.sh` to 
`$HADOOP_HOME/bin` at
  the machine from which you want to trigger the remote deployment and create a 
text
- file $HADOOP_HOME/conf/raidnode at the same machine containing the name of 
the server
+ file `$HADOOP_HOME/conf/raidnode` on the same machine containing the name of 
the machine
- where the RaidNode should run. These scripts run ssh to the specified machine 
and
+ where the RaidNode should be deployed. These scripts ssh to the specified 
machine and
- invoke start/stop-raidnode.sh there. As an example, you might want to change
+ invoke `start-raidnode.sh`/`stop-raidnode.sh` there.
+ 
+ For easy maintencance, you might want to change
- start-mapred.sh in the JobTracker machine so that it automatically calls
+ `start-mapred.sh` on the JobTracker machine so that it automatically calls
- start-raidnode-remote.sh (and do the equivalent thing for stop-mapred.sh and
+ `start-raidnode-remote.sh` (and make a similar change to`stop-mapred.sh` to 
call
- stop-raidnode-remote.sh).
+ `stop-raidnode-remote.sh`).
  
+ To monitor the health of a DRFS, use the fsck command provided by the 
RaidShell.
- Run fsckraid periodically (being developed as part of another JIRA). This 
validates parity
- blocks of a file.

[Hadoop Wiki] Update of "HDFS-RAID" by PatrickKling

Reply via email to