Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "HDFS-RAID" page has been changed by PatrickKling. The comment on this change is: added sections on BlockFixer and RaidShell. http://wiki.apache.org/hadoop/HDFS-RAID?action=diff&rev1=2&rev2=3 -------------------------------------------------- * the DRFS client, which provides application access to the the files in the DRFS and transparently recovers any corrupt or missing blocks encountered when reading a file, * the RaidNode, a daemon that creates and maintains parity files for all data files stored in the DRFS, * the BlockFixer, which periodically recomputes blocks that have been lost or corrupted, - * the RaidFsck utility, which allows the administrator to manually trigger the recomputation of missing or corrupt blocks and to check for files that have become irrecoverably corrupted. + * the RaidShell utility, which allows the administrator to manually trigger the recomputation of missing or corrupt blocks and to check for files that have become irrecoverably corrupted. === DRFS client === @@ -33, +33 @@ It is important to note that while the DRFS client recomputes missing blocks when reading corrupt files it does not insert these missing blocks back into the file system. Instead, it discards them once the application request has been fulfilled. - The BlockFixer daemon and the RaidFsck tool can be used to persistently fix bad blocks. + The BlockFixer daemon and the RaidShell tool can be used to persistently fix bad blocks. === RaidNode === @@ -55, +55 @@ (currently under development) - The BlockFixer is a daemon that runs at the RaidNode + The BlockFixer is a daemon that runs at the RaidNode and periodically inspects the health of the paths for which DRFS is configured. + When a file with missing or corrupt blocks is encountered, these blocks are recomputed and inserted back into the file system. - === RaidFsck === + There are two implementations of the BlockFixer: + * the LocalBlockFixer, which recomputes bad blocks locally at the RaidNode. + * the DistributedBlockFixer, which dispatches map reduce jobs to recompute blocks. + + === RaidShell === (currently under development) + + The RaidShell is a tool that allows the administrator to maintain and inspect a DRFS. It supports commands for manually triggering the + recomputation of bad data blocks and also allows the administrator to display a list of irrecoverable files (i.e., files for which too + many data or parity blocks have been lost). + == Using HDFS RAID == @@ -199, +209 @@ </property }}} - === Administration === + === Running DRFS === The DRFS provides support for administration at runtime without any downtime to cluster services. It is possible to add/delete new paths to be raided without - interrupting any load on the cluster. If you change raid.xml, its contents will be - reload within seconds and the new contents will take effect immediately. + interrupting any load on the cluster. Changes to `raid.xml` are detected periodically (every few seconds) + and new policies are applied immediately. Designate one machine in your cluster to run the RaidNode software. You can run this daemon on any machine irrespective of whether that machine is running any other hadoop daemon or not. You can start the RaidNode by running the following on the selected machine: + {{{ nohup $HADOOP_HOME/bin/hadoop org.apache.hadoop.raid.RaidNode >> /xxx/logs/hadoop-root-raidnode-hadoop.xxx.com.log & + }}} - Optionally, we provide two scripts to start and stop the RaidNode. Copy the scripts + We also provide two scripts to start and stop the RaidNode more easily. Copy the scripts - start-raidnode.sh and stop-raidnode.sh to the directory $HADOOP_HOME/bin in the machine + `start-raidnode.sh` and `stop-raidnode.sh` to the directory `$HADOOP_HOME/bin` on the machine - you would like to deploy the daemon. You can start or stop the RaidNode by directly + where the RaidNode is to be deployed. You can then start or stop the RaidNode by directly - callying the scripts from that machine. If you want to deploy the RaidNode remotely, + calling these scripts on that machine. To deploy the RaidNode remotely, - copy start-raidnode-remote.sh and stop-raidnode-remote.sh to $HADOOP_HOME/bin at + copy `start-raidnode-remote.sh` and `stop-raidnode-remote.sh` to `$HADOOP_HOME/bin` at the machine from which you want to trigger the remote deployment and create a text - file $HADOOP_HOME/conf/raidnode at the same machine containing the name of the server + file `$HADOOP_HOME/conf/raidnode` on the same machine containing the name of the machine - where the RaidNode should run. These scripts run ssh to the specified machine and + where the RaidNode should be deployed. These scripts ssh to the specified machine and - invoke start/stop-raidnode.sh there. As an example, you might want to change + invoke `start-raidnode.sh`/`stop-raidnode.sh` there. + + For easy maintencance, you might want to change - start-mapred.sh in the JobTracker machine so that it automatically calls + `start-mapred.sh` on the JobTracker machine so that it automatically calls - start-raidnode-remote.sh (and do the equivalent thing for stop-mapred.sh and + `start-raidnode-remote.sh` (and make a similar change to`stop-mapred.sh` to call - stop-raidnode-remote.sh). + `stop-raidnode-remote.sh`). + To monitor the health of a DRFS, use the fsck command provided by the RaidShell. - Run fsckraid periodically (being developed as part of another JIRA). This validates parity - blocks of a file.
