Hello,

I'm evaluating Hadoop for a large GIS application.

When running the wordcount example, I experience an issue where my master node cannot open a socket to port 50010 of my remove slave node.

When I run the example with only my master in the slaves file, it works fine. When I add a second machine, i get the error.

Here is my config:

Running .0.3.2 of Hadoop
OS X 10.4.7 Server for master (Elric.local - 10.0.1.4)
OS X 10.4.7 for remote slave (Corum.local - 10.0.1.8)

Using the standard hadoop-default.xml.

Here's my hadoop-site.xml (which is the same on both machines):

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
  <value>Elric.local:9000</value>
  <description>
    The name of the default file system. Either the literal string
    "local" or a host:port for NDFS.
  </description>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>Elric.local:9001</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If
    "local", then jobs are run in-process as a single map and
    reduce task.
  </description>
</property>

<property>
  <name>mapred.map.tasks</name>
  <value>12</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>12</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>dfs.name.dir</name>
  <value>/nutch/filesystem/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/nutch/filesystem/data</value>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/nutch/filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/nutch/filesystem/mapreduce/local</value>
</property>

<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>

<property>
  <name>dfs.datanode.port</name>
  <value>50010</value>
<description>The port number that the dfs datanode server uses as a starting
               point to look for a free port to listen on.
</description>
</property>

<property>
  <name>dfs.namenode.logging.level</name>
  <value>debug</value>
<description>The logging level for dfs namenode. Other values are "dir"(trac e namespace mutations), "block"(trace block under/over replications and block
creations/deletions), or "all".</description>
</property>
</configuration>

Here is the terminal output on the server:

Elric:/nutch/hadoop nutch$ ./start-all.sh
-su: ./start-all.sh: No such file or directory
Elric:/nutch/hadoop nutch$ ./bin/start-all.sh
rsync from Elric.local:/nutch/hadoop
starting namenode, logging to /nutch/hadoop/logs/hadoop-nutch- namenode-Elric.local.out
Elric.local: rsync from Elric.local:/nutch/hadoop
10.0.1.8: rsync from Elric.local:/nutch/hadoop
Elric.local: starting datanode, logging to /nutch/hadoop/logs/hadoop- nutch-datanode-Elric.local.out 10.0.1.8: starting datanode, logging to /nutch/hadoop/logs/hadoop- nutch-datanode-Corum.local.out
rsync from Elric.local:/nutch/hadoop
starting jobtracker, logging to /nutch/hadoop/logs/hadoop-nutch- jobtracker-Elric.local.out
Elric.local: rsync from Elric.local:/nutch/hadoop
10.0.1.8: rsync from Elric.local:/nutch/hadoop
Elric.local: starting tasktracker, logging to /nutch/hadoop/logs/ hadoop-nutch-tasktracker-Elric.local.out 10.0.1.8: starting tasktracker, logging to /nutch/hadoop/logs/hadoop- nutch-tasktracker-Corum.local.out Elric:/nutch/hadoop nutch$ ./bin/hadoop jar hadoop-*-examples.jar wordcount cat out4 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/hadoop-default.xml 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/mapred-default.xml 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/hadoop-site.xml 06/07/31 11:15:45 INFO ipc.Client: Client connection to 10.0.1.4:9000: starting 06/07/31 11:15:45 INFO ipc.Client: Client connection to 10.0.1.4:9001: starting 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/hadoop-default.xml 06/07/31 11:15:45 INFO conf.Configuration: parsing file:/nutch/hadoop/ conf/hadoop-site.xml 06/07/31 11:18:47 INFO fs.DFSClient: Waiting to find target node: Corum.local/10.0.1.8:50010


Here is the netstat on the server while waiting:

Elric:/nutch/hadoop/logs ty$ netstat
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 10.0.1.4.49808 corum.local.50010 SYN_SENT tcp4 0 0 10.0.1.4.49807 corum.local.50010 SYN_SENT tcp4 0 0 10.0.1.4.49806 corum.local.50010 SYN_SENT tcp4 0 0 10.0.1.4.etlservicemgr 10.0.1.4.49795 ESTABLISHED tcp4 0 0 10.0.1.4.49795 10.0.1.4.etlservicemgr ESTABLISHED tcp4 0 0 10.0.1.4.cslistener 10.0.1.4.49794 ESTABLISHED tcp4 0 0 10.0.1.4.49794 10.0.1.4.cslistener ESTABLISHED tcp4 0 0 10.0.1.4.cslistener corum.local.49265 ESTABLISHED tcp4 0 0 10.0.1.4.etlservicemgr corum.local.49264 ESTABLISHED tcp4 0 0 10.0.1.4.cslistener 10.0.1.4.49791 ESTABLISHED tcp4 0 0 10.0.1.4.49791 10.0.1.4.cslistener ESTABLISHED tcp4 0 0 10.0.1.4.etlservicemgr 10.0.1.4.49790 ESTABLISHED tcp4 0 0 10.0.1.4.49790 10.0.1.4.etlservicemgr ESTABLISHED tcp4 0 0 10.0.1.4.cslistener 10.0.1.4.49784 ESTABLISHED tcp4 0 0 10.0.1.4.49784 10.0.1.4.cslistener ESTABLISHED tcp4 0 0 10.0.1.4.cslistener corum.local.49260 ESTABLISHED tcp4 0 0 10.0.1.4.cslistener 10.0.1.4.49780 ESTABLISHED tcp4 0 0 10.0.1.4.49780 10.0.1.4.cslistener ESTABLISHED tcp4 0 0 10.0.1.4.49756 mail.mac.com.imap ESTABLISHED tcp4 0 0 10.0.1.4.49732 mail.mac.com.imap ESTABLISHED tcp4 0 0 localhost.netinfo-loca localhost.1015 ESTABLISHED tcp4 0 0 localhost.1015 localhost.netinfo-loca ESTABLISHED tcp4 0 0 localhost.ipulse-ics localhost.49174 ESTABLISHED tcp4 0 0 localhost.49174 localhost.ipulse-ics ESTABLISHED tcp4 0 0 localhost.ipulse-ics localhost.49173 ESTABLISHED tcp4 0 0 localhost.49173 localhost.ipulse-ics ESTABLISHED tcp4 0 0 localhost.ipulse-ics localhost.49172 ESTABLISHED tcp4 0 0 localhost.49172 localhost.ipulse-ics ESTABLISHED tcp4 0 0 localhost.netinfo-loca localhost.1017 ESTABLISHED tcp4 0 0 localhost.1017 localhost.netinfo-loca ESTABLISHED tcp4 0 0 localhost.netinfo-loca localhost.1021 ESTABLISHED tcp4 0 0 localhost.1021 localhost.netinfo-loca ESTABLISHED
udp4       0      0  localhost.49292        localhost.1023

The exception I get in the log is:

2006-07-31 11:17:46,390 INFO org.apache.hadoop.dfs.DataNode: Received block blk_316809370547197643 from /10.0.1.4 2006-07-31 11:18:31,185 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_-8788276503516502504 to Corum.local/10.0.1.8:500
10
java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress (PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:430)
        at java.net.Socket.connect(Socket.java:507)
at org.apache.hadoop.dfs.DataNode$DataTransfer.run (DataNode.java:782)
        at java.lang.Thread.run(Thread.java:613)


Can anyone help me identify the issue?

Thanks!

Ty

Reply via email to