Re: Job hangs, strange error messages...help sought...

Ted Dunning Tue, 30 Oct 2007 10:34:36 -0800

Is it possible you have some nodes running multiple instances of datanodes?



On 10/30/07 11:06 AM, "C G" <[EMAIL PROTECTED]> wrote:

> Hi All:
>    
>   Environment:  4 node grid running hadoop-0.14.1.
>    
>   With the system shutdown I wiped out the old HDFS directory structure and
> created an empty directory.  Did a namenode format, and then brought up the
> system with start-all.sh.
>    
>   I then set up a directory structure, and ran the first job.  The job runs
> 100% of the map jobs, completes ~ 87% of the reduce jobs, and then hangs.
> There are no user-level error messages.  All systems go to idle.
>    
>   I started looking at the Hadoop logs, first strange message from the
> namenode log:
>    
>   2007-10-30 13:48:01,991 WARN org.apache.hadoop.dfs.StateChange: DIR*
> NameSystem.startFile: failed to create file /import/raw_logs/20070929/_t
> ask_200710301345_0001_r_000001_0/part-00001 for
> DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because
> current leaseholder i
> s trying to recreate file.
> 2007-10-30 13:48:01,992 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 9 on 54310, call create(/import/raw_logs/20070929/_task_2007103
> 01345_0001_r_000001_0/part-00001, DFSClient_task_200710301345_0001_r_000001_0,
> true, 3, 67108864) from 10.2.11.4:34016: error: org.apache.had
> oop.dfs.AlreadyBeingCreatedException: failed to create file
> /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-00001 for
> DFSCl
> ient_task_200710301345_0001_r_000001_0 on client 10.2.11.4 because current
> leaseholder is trying to recreate file.
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
> /import/raw_logs/20070929/_task_200710301345_0001_r_000001_0/part-0
> 0001 for DFSClient_task_200710301345_0001_r_000001_0 on client 10.2.11.4
> because current leaseholder is trying to recreate file.
>         at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:788)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:725)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:307)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
> ava:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
> 
>   Second strange error message comes from the jobtracker log:
>   2007-10-30 13:59:26,190 INFO org.apache.hadoop.mapred.JobTracker: Ignoring
> 'duplicate' heartbeat from 'tracker_localhost.localdomain:50050'
>    
>   I'm curious about how to proceed.  I suspect that my code is OK as I've run
> it numerous times in both single node and multiple grid environments.   I've
> never seen these error messages before.
>    
>   Any help much appreciated....
>    
>   Thanks,
>   C G 
> 
>  __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

Re: Job hangs, strange error messages...help sought...

Reply via email to