Hi All:
Environment: 4 node grid running hadoop-0.14.1.
With the system shutdown I wiped out the old HDFS directory structure and
created an empty directory. Did a namenode format, and then brought up the
system with start-all.sh.
I then set up a directory structure, and ran the first job. The job runs
100% of the map jobs, completes ~ 87% of the reduce jobs, and then hangs.
There are no user-level error messages. All systems go to idle.
I started looking at the Hadoop logs, first strange message from the namenode
log:
2007-10-30 13:48:01,991 WARN org.apache.hadoop.dfs.StateChange: DIR*
NameSystem.startFile: failed to create file /import/raw_logs/20070929/_t
ask_200710301345_0001_r_000001_0/part-00001 for
DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current
leaseholder i
s trying to recreate file.
2007-10-30 13:48:01,992 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9
on 54310, call create(/import/raw_logs/20070929/_task_2007103
01345_0001_r_000001_0/part-00001, DFSClient_task_200710301345_0001_r_000001_0,
true, 3, 67108864) from 10.2.11.4:34016: error: org.apache.had
oop.dfs.AlreadyBeingCreatedException: failed to create file
/import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-00001 for
DFSCl
ient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current
leaseholder is trying to recreate file.
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
/import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-0
0001 for DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4
because current leaseholder is trying to recreate file.
at
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:788)
at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:725)
at org.apache.hadoop.dfs.NameNode.create(NameNode.java:307)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
Second strange error message comes from the jobtracker log:
2007-10-30 13:59:26,190 INFO org.apache.hadoop.mapred.JobTracker: Ignoring
'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'
I'm curious about how to proceed. I suspect that my code is OK as I've run
it numerous times in both single node and multiple grid environments. I've
never seen these error messages before.
Any help much appreciated....
Thanks,
C G
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com