Hi,

I've been using Hadoop for three months on a test cluster with two machines
and never had a single problem with it. However, after moving from version
0.20.0 to version 0.20.1, the tasktracker is going down frequently. The HDFS
still works. I can read and write files, even run HBase over it. The problem
is that I can't schedule jobs anymore. Bellow are the logs. Does anybody
know what may be happening? Any clue? The tasktracker dies and doesn't come
back until I restart it.

Thanks in advance,
Lucas





Log file: hadoop-root-tasktracker-server2.log

2009-10-17 18:06:39,044 INFO org.apache.hadoop.mapred.IndexCache: Map ID
attempt_200910151507_2024_m_000001_0 not found in cache
2009-10-17 18:06:39,044 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200910151507_2024_m_000000_0 done; removing files.
2009-10-17 18:07:12,103 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.io.IOException: Call to server2/192.168.1.3:9001

failed on local exception: java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:774)
        at org.apache.hadoop.ipc.Client.call(Client.java:742)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at org.apache.hadoop.mapred.$Proxy4.heartbeat(Unknown Source)
        at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1215)
        at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1037)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

2009-10-17 18:07:12,104 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to 'server2' with reponseId '-4573
2009-10-17 18:07:13,105 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 0 time(s).
2009-10-17 18:07:14,105 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 1 time(s).
2009-10-17 18:07:15,106 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 2 time(s).
2009-10-17 18:07:16,106 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 3 time(s).
2009-10-17 18:07:17,106 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 4 time(s).
2009-10-17 18:07:18,107 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 5 time(s).
2009-10-17 18:07:19,107 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 6 time(s).
2009-10-17 18:07:20,108 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 7 time(s).
2009-10-17 18:07:21,108 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 8 time(s).
2009-10-17 18:07:22,109 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 9 time(s).
2009-10-17 18:07:22,111 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.net.ConnectException: Call to

server2/192.168.1.3:9001 failed on connection exception:
java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:766)
        at org.apache.hadoop.ipc.Client.call(Client.java:742)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at org.apache.hadoop.mapred.$Proxy4.heartbeat(Unknown Source)
        at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1215)
        at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1037)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
        at
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:859)
        at org.apache.hadoop.ipc.Client.call(Client.java:719)
        ... 6 more

2009-10-17 18:07:22,111 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to 'server2' with reponseId '-4573
2009-10-17 18:07:23,112 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 0 time(s).
2009-10-17 18:07:24,112 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 1 time(s).
2009-10-17 18:07:25,113 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 2 time(s).
2009-10-17 18:07:26,113 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 3 time(s).
2009-10-17 18:07:27,114 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 4 time(s).
2009-10-17 18:07:28,114 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 5 time(s).
2009-10-17 18:07:29,115 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 6 time(s).
2009-10-17 18:07:30,115 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 7 time(s).
2009-10-17 18:07:31,116 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 8 time(s).
2009-10-17 18:07:32,116 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 9 time(s).





Log file: hadoop-root-tasktracker-server.log

2009-10-17 18:06:41,680 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_200910151507_2024
2009-10-17 18:06:41,680 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_200910151507_2024 being deleted.
2009-10-17 18:07:11,695 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.io.IOException: Call to server2/192.168.1.3:9001

failed on local exception: java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:774)
        at org.apache.hadoop.ipc.Client.call(Client.java:742)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at org.apache.hadoop.mapred.$Proxy4.heartbeat(Unknown Source)
        at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1215)
        at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1037)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

2009-10-17 18:07:11,695 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to 'server2' with reponseId '-4584
2009-10-17 18:07:12,696 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 0 time(s).
2009-10-17 18:07:13,697 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 1 time(s).
2009-10-17 18:07:14,698 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 2 time(s).
2009-10-17 18:07:15,699 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 3 time(s).
2009-10-17 18:07:16,699 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 4 time(s).
2009-10-17 18:07:17,700 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 5 time(s).
2009-10-17 18:07:18,701 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 6 time(s).
2009-10-17 18:07:19,701 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 7 time(s).
2009-10-17 18:07:20,702 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 8 time(s).
2009-10-17 18:07:21,703 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: server2/192.168.1.3:9001. Already tried 9 time(s).
2009-10-17 18:07:21,704 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.net.ConnectException: Call to

server2/192.168.1.3:9001 failed on connection exception:
java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:766)
        at org.apache.hadoop.ipc.Client.call(Client.java:742)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at org.apache.hadoop.mapred.$Proxy4.heartbeat(Unknown Source)
        at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1215)
        at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1037)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
        at
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:859)
        at org.apache.hadoop.ipc.Client.call(Client.java:719)
        ... 6 more





This happens when I try to run a job.

r...@server2:/usr/local/hadoop-0.20.1# bin/hadoop jar
ninvest-feedcrawler-0.3.0.jar com.nash.ninvest.backend.feed.Main
/ninvest/feeds news
news_checksum news_indexable 1
09/10/19 07:31:24 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 0 time(s).
09/10/19 07:31:25 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 1 time(s).
09/10/19 07:31:26 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 2 time(s).
09/10/19 07:31:27 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 3 time(s).
09/10/19 07:31:28 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 4 time(s).
09/10/19 07:31:29 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 5 time(s).
09/10/19 07:31:30 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 6 time(s).
09/10/19 07:31:31 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 7 time(s).
09/10/19 07:31:32 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 8 time(s).
09/10/19 07:31:33 INFO ipc.Client: Retrying connect to server: server2/
192.168.1.3:9001. Already tried 9 time(s).
Exception in thread "main" java.net.ConnectException: Call to server2/
192.168.1.3:9001 failed on connection exception:

java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:766)
        at org.apache.hadoop.ipc.Client.call(Client.java:742)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown
Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
        at
org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
        at com.nash.ninvest.backend.feed.Main.createSubmittableJob(Unknown
Source)
        at com.nash.ninvest.backend.feed.Main.run(Unknown Source)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at com.nash.ninvest.backend.feed.Main.main(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
        at
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:859)
        at org.apache.hadoop.ipc.Client.call(Client.java:719)
        ... 16 more

Reply via email to