Hi Doug, > Chris Mattmann wrote: > > One thing I noticed is that on my linux cluster, the > > additional options "-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" are > not > > available on my system. > > You can override this by editing conf/hadoop-env.sh. Both are optional, > but convenient. Perhaps we should avoid using them until they're in > wider distribution.
Yeah, I think it would be nice to find a workaround to explicitly using these options. I'm running a 16 node/32 processor Linux cluster with 64-bit, processors, with CentOS Linux installed, along with the ROCKS toolkit. I think it's a pretty standard cluster distribution and it doesn't come with an SSH that uses those options. I don't want my environment to dictate what Hadoop should do for everyone else, but I think it's something to definitely consider... > > Any ideas? Additionally, would it make sense to put in options within > the > > startup script to only start DFS related daemons and slaves, and the > same > > goes for only starting MapRed daemons and slaves? If so, I can create a > JIRA > > issue about this. > > Instead of 'bin/start-all.sh' that's just 'bin/hadoop-daemon.sh start > namenode; bin/hadoop-daemons.sh start datanode'. Is that what you're > after? I guess we could add a bin/start-dfs.sh command that does this. Yeah, that's exactly what I'm after. Some, non-technical, joe-user way of just starting up DFS and stopping DFS, not having anything to do with MapReduce. I'm interested in Hadoop just for its DFS capabilities for now. We're looking into HDFS within my group at JPL as a solution for our data movement issue that will arise on our cluster when we begin to stage large (gigabyte) sized files between an NFS mounted RAID disk and cluster nodes that have jobs executing on them that need non-NFS access to the files (having thousands of jobs reading from a single NFS mounted RAID can become a bottleneck). So right now I'm benchmarking HDFS along with PVFS (parallel virtual file system) as potential solutions to the data movement issue between the cluster nodes and our NFS RAIDs. Anyways, getting back to the point, yeah, it would be great to have something to just start and stop DFS, and start and stop MapReduce for that matter, even if it just amounts to the simple command you mentioned above. I've attached a quick patch for Hadoop that implements it. Thanks, Doug! Cheers, Chris > > Doug
Index: bin/start-dfs.sh =================================================================== --- bin/start-dfs.sh (revision 0) +++ bin/start-dfs.sh (revision 0) @@ -0,0 +1,14 @@ +#!/bin/bash + +# Start all hadoop daemons. Run this on master node. + +bin=`dirname "$0"` +bin=`cd "$bin"; pwd` + +# start dfs daemons +# start namenode after datanodes, to minimize time namenode is up w/o data +# note: datanodes will log connection errors until namenode starts + +"$bin"/hadoop-daemon.sh start namenode +"$bin"/hadoop-daemons.sh start datanode + Property changes on: src/hadoop/bin/start-dfs.sh ___________________________________________________________________ Name: svn:executable + * Index: bin/start-mapred.sh =================================================================== --- bin/start-mapred.sh (revision 0) +++ bin/start-mapred.sh (revision 0) @@ -0,0 +1,13 @@ +#!/bin/bash + +# Start hadoop map reduce daemons. Run this on master node. + +bin=`dirname "$0"` +bin=`cd "$bin"; pwd` + +# start mapred daemons +# start jobtracker first to minimize connection errors at startup + +"$bin"/hadoop-daemon.sh start jotracker +"$bin"/hadoop-daemons.sh start tasktracker + Property changes on: bin/start-mapred.sh ___________________________________________________________________ Name: svn:executable + * Index: bin/stop-dfs.sh =================================================================== --- bin/stop-dfs.sh (revision 0) +++ bin/stop-dfs.sh (revision 0) @@ -0,0 +1,10 @@ +#!/bin/bash + +# Stop hadoop DFS daemons. Run this on master node. + +bin=`dirname "$0"` +bin=`cd "$bin"; pwd` + +"$bin"/hadoop-daemon.sh stop namenode +"$bin"/hadoop-daemons.sh stop datanode + Property changes on: bin/stop-dfs.sh ___________________________________________________________________ Name: svn:executable + * Index: bin/stop-mapred.sh =================================================================== --- bin/stop-mapred.sh (revision 0) +++ bin/stop-mapred.sh (revision 0) @@ -0,0 +1,10 @@ +#!/bin/bash + +# Stop hadoop map reduce daemons. Run this on master node. + +bin=`dirname "$0"` +bin=`cd "$bin"; pwd` + +"$bin"/hadoop-daemon.sh stop jobtracker +"$bin"/hadoop-daemons.sh stop tasktracker + Property changes on: bin/stop-mapred.sh ___________________________________________________________________ Name: svn:executable + *
