Chris Mattmann wrote:
Hi Folks,

  I'm new to this list (but a familiar face on the Nutch one :-) ). I had a
newbie question. It seems that when I went to start Hadoop DFS using the
start-all.sh script in the bin directory, I note that for DFS it starts a
single namenode using hadoop-daemon.sh, then uses slaves.sh to start all the
slave datanodes. One thing I noticed is that on my linux cluster, the
additional options "-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" are not
available on my system. I tried typing "man ssh_config" and the closest
thing I see to ConnectTimeout is:


     ConnectionAttempts
             Specifies the number of tries (one per second) to make before
             exiting.  The argument must be an integer.  This may be useful
in scripts if the connection sometimes fails.  The default is 1.


Is this normal? Typing uname -a on my machine results in:

Linux <XXX> 2.4.21-37.XXX.ELsmp #1 SMP Tue Oct 18 11:43:19 PDT 2005 x86_64
x86_64 x86_64 GNU/Linux

When I remove those options from the slaves.sh script, the starting of
hadoop DFS succeeds (I commented out the part in start-all.sh that starts
MapReduce stuff, because I'm only trying to use DFS stuff).

Here is the output of ssh -V on my system:

[EMAIL PROTECTED] ~/hadoop]$ ssh -V
OpenSSH_3.6.1p2, SSH protocols 1.5/2.0, OpenSSL 0x0090701f

Yeah. On older ssh, neither option is available. "OpenSSH_3.5p1n, SSH protocols 1.5/2.0, OpenSSL 0x0090609f" doesn't have either option whereas OpenSSH_4.2p1 Debian-6, OpenSSL 0.9.8a 11 Oct 2005" does. Looking over release notes, I note that SendEnv shows up in release 3.9. Doesn't seem to be a note on when ConnectTimeout was added (I didn't spend a long time looking).

Here's definitions from ssh_config man page:

ConnectTimeout
Specifies the timeout (in seconds) used when connecting to the
ssh server, instead of using the default system TCP timeout.
This value is used only when the target is down or really
unreachable, not when it refuses the connection.

...

SendEnv
Specifies what variables from the local environ(7) should be sent
to the server. Note that environment passing is only supported
for protocol 2, the server must also support it, and the server
must be configured to accept these environment variables. Refer
to AcceptEnv in sshd_config(5) for how to configure the server.
Variables are specified by name, which may contain the wildcard
characters ‘*’ and ‘?’. Multiple environment variables may be
separated by whitespace or spread across multiple SendEnv direc‐
tives. The default is not to send any environment variables.

If you've not noticed already, you can blank out ssh options in hadoop-env.sh by setting HADOOP_SSH_OPTS to the empty-string. All should work. You'll just not have a timeout on your ssh attempts and you won't be able to forward head node HADOOP_* environment variables out to slaves.

St.Ack

Reply via email to