[Hadoop Wiki] Trivial Update of "LargeClusterTips" by SteveLoughran

Apache Wiki Wed, 24 Jun 2009 03:18:44 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/LargeClusterTips

The comment on the change is:
formatting

------------------------------------------------------------------------------
  
  Things will go wrong. There is always SPOF. Test your failure handling 
processes before you go live. 
  
- * Simulate a corrupted edit log by killing the namenode process, truncating 
the (binary) edit log, and bringing it up. See how the team handles it. 
+  * Simulate a corrupted edit log by killing the namenode process, truncating 
the (binary) edit log, and bringing it up. See how the team handles it. 
- * Turn off one of the switches, pull out a network cable. See how the cluster 
handles it, how it recovers. Then put the switch back on.
+  * Turn off one of the switches, pull out a network cable. See how the 
cluster handles it, how it recovers. Then put the switch back on.
- * Turn an entire rack off without warning. See what happens when they go 
offline.
+  * Turn an entire rack off without warning. See what happens when they go 
offline.
- * Turn off DNS. 
+  * Turn off DNS. Or just the rDNS side of things.
- * Turn off the entire datacenter, switch it back on. Are there any race 
conditions?
+  * Turn off the entire datacenter, switch it back on. Are there any race 
conditions?
- * Write an job that tries to generate too much data, fills up the cluster. 
How is it handled?
+  * Write an job that tries to generate too much data, fills up the cluster. 
How is it handled?

[Hadoop Wiki] Trivial Update of "LargeClusterTips" by SteveLoughran

Reply via email to