As with Dhruba's comment, so long as it is just the namenode that is
running on a networked file system everything should be chill. The namenode
keeps all of its working metadata in main mem, and it only occasionally
pushes a log file out to hard storage (and if I remember correctly you can
adjust this time window in one of the site files).
However, you are going to run into huge performance issues running
datanodes over a networked storage system. Having to push that many file
requests over a network for a respectable mapreduce job is going to kill
your equipment.
- Grant
On Oct 21 2009, Jonathan Seidman wrote:
Apologies if this has been answered previously, but I'm unable to find
anything that seems to cover this.
It's clear that datanodes require local storage for Hadoop to function
efficiently, but is there any significant disadvantage to using external
storage for namenodes? We're exploring the possibility of using a
different class of hardware for our namenodes with attached storage and
little or no internal storage. Some of the benefits this would provide us
are: 1) allowing our sysadmins to deploy hardware that they're familiar
with and already have considerable experience keeping up in a production
environment. 2) no namenode downtime to replace a failed disk.
We don't anticipate that this approach would cause any significant
degradation to performance, but let me know if there's something we're not
considering.
Thanks.
Jonathan
--
--
Grant Mackey
PhD student Computer Engineering
University of Central Florida
Rm 231 cube 5 (321) 960-8851