Not a complete list by far, but just a start :
For HDFS:
- Make sure you run Java 6 (jdk1.6).
- Set namenode handler count to 40 or more (dfs.namenode.handler.count,
and may be mapred.job.tracker.handler.count etc).
- more config guides are in the works :
https://issues.apache.org/jira/browse/HADOOP-1917
Raghu.
Derek Gottfrid wrote:
Are there configuration suggestions for 1k nodes ? I was seeing tons
of timeouts trying to run 1k nodes. Are there network settings that I
need to make? Out of the box stuff seemed to work up to a couple
hundred but I want to go bigger. Pointers/Suggestions?
derek
ps: i wrote up my ec2/hadoop at nytimes.com - check it out
http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/