Hi All:
   
  Environment:  4 node grid running hadoop-0.14.1.  
   
  With the system shutdown I wiped out the old HDFS directory structure and 
created an empty directory.  Did a namenode format, and then brought up the 
system with start-all.sh.
   
  I then set up a directory structure, and ran the first job.  The job runs 
100% of the map jobs, completes ~ 87% of the reduce jobs, and then hangs.  
There are no user-level error messages.  All systems go to idle.
   
  I started looking at the Hadoop logs, first strange message from the namenode 
log:
   
  2007-10-30 13:48:01,991 WARN org.apache.hadoop.dfs.StateChange: DIR* 
NameSystem.startFile: failed to create file /import/raw_logs/20070929/_t
ask_200710301345_0001_r_000001_0/part-00001 for 
DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current 
leaseholder i
s trying to recreate file.
2007-10-30 13:48:01,992 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 
on 54310, call create(/import/raw_logs/20070929/_task_2007103
01345_0001_r_000001_0/part-00001, DFSClient_task_200710301345_0001_r_000001_0, 
true, 3, 67108864) from 10.2.11.4:34016: error: org.apache.had
oop.dfs.AlreadyBeingCreatedException: failed to create file 
/import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-00001 for 
DFSCl
ient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current 
leaseholder is trying to recreate file.
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
/import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-0
0001 for DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 
because current leaseholder is trying to recreate file.
        at 
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:788)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:725)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:307)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)

  Second strange error message comes from the jobtracker log:
  2007-10-30 13:59:26,190 INFO org.apache.hadoop.mapred.JobTracker: Ignoring 
'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'
   
  I'm curious about how to proceed.  I suspect that my code is OK as I've run 
it numerous times in both single node and multiple grid environments.   I've 
never seen these error messages before.
   
  Any help much appreciated....
   
  Thanks,
  C G 

 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to