On Fri, 2005-11-04 at 19:15 -0800, Doug Cutting wrote: > Rod Taylor wrote: > > There is only a single datanode and there are 20 hosts. > > That's a lot of load on one datanode. I typically run a datanode on > every host, accessing the local drives on that host.
I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. A crash of any datanode would take down the entire system (no data replication since it's a common data-store in the end). Reducing it to a single datanode did not have this impact. The boxes themselves don't have much for local drives aside from a bit of temp space. Recently we moved the datanode, namenode and jobtracker to their own machine per your earlier suggestion and upgraded Nutch sources to Nov 1st from about October 20th. This is when the difficulties started. Earlier with the single datanode, namenode and jobtracker on an overloaded worker machine (load average was around 20 normally) things worked without errors, but slowly. -- Rod Taylor <[EMAIL PROTECTED]>
