Thanks everyone for your help. It is running fine now.
On Fri, May 4, 2012 at 11:22 AM, Michael Segel <michael_se...@hotmail.com>wrote: > Well > That was one of the things I had asked. > ulimit -a says it all. > > But you have to do this for the users... hdfs, mapred, and hadoop > > (Which is why I asked about which flavor.) > > On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote: > > > Keith > > > > What is the the output for ulimit -n? Your value for number of open > files is probably too low. > > > > Raj > > > > > > > > > >> ________________________________ > >> From: Keith Thompson <kthom...@binghamton.edu> > >> To: common-user@hadoop.apache.org > >> Sent: Thursday, May 3, 2012 4:33 PM > >> Subject: Re: Reduce Hangs at 66% > >> > >> I am not sure about ulimits, but I can answer the rest. It's a Cloudera > >> distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce > step, > >> I am taking keys in the form of (gridID, date), each with a value of 1. > The > >> reduce step just sums the 1's as the final output value for the key > (It's > >> counting how many people were in the gridID on a certain day). > >> > >> I have been running other more complicated jobs with no problem, so I'm > not > >> sure why this one is being peculiar. This is the code I used to execute > the > >> program from the command line (the source is a file on the hdfs): > >> > >> hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1 > >> > >> The program then executes the map and gets to 66% of the reduce, then > stops > >> responding. The cause of the error seems to be: > >> > >> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: > >>> The temporary job-output directory > >>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary > >>> doesn't exist! > >> > >> I don't understand what the _temporary is. I am assuming it's something > >> Hadoop creates automatically. > >> > >> > >> > >> On Thu, May 3, 2012 at 5:02 AM, Michel Segel <michael_se...@hotmail.com > >wrote: > >> > >>> Well... > >>> Lots of information but still missing some of the basics... > >>> > >>> Which release and version? > >>> What are your ulimits set to? > >>> How much free disk space do you have? > >>> What are you attempting to do? > >>> > >>> Stuff like that. > >>> > >>> > >>> > >>> Sent from a remote device. Please excuse any typos... > >>> > >>> Mike Segel > >>> > >>> On May 2, 2012, at 4:49 PM, Keith Thompson <kthom...@binghamton.edu> > >>> wrote: > >>> > >>>> I am running a task which gets to 66% of the Reduce step and then > hangs > >>>> indefinitely. Here is the log file (I apologize if I am putting too > much > >>>> here but I am not exactly sure what is relevant): > >>>> > >>>> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip > >>>> task_201202240659_6433_r_000000, for tracker > >>>> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' > >>>> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: > >>>> Task 'attempt_201202240659_6433_m_000001_0' has completed > >>>> task_201202240659_6433_m_000001 successfully. > >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: > >>>> Error from attempt_201202240659_6432_r_000000_0: Task > >>>> attempt_201202240659_6432_r_000000_0 failed to report status for 1800 > >>>> seconds. Killing! > >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Removing task 'attempt_201202240659_6432_r_000000_0' > >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to > >>>> tip task_201202240659_6432_r_000000, for tracker > >>>> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' > >>>> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Removing task 'attempt_201202240659_6432_r_000000_0' > >>>> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip > >>>> task_201202240659_6432_r_000000, for tracker > >>>> 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117' > >>>> 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress: > >>>> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: > >>>> The temporary job-output directory > >>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary > >>>> doesn't exist! > >>>> at > >>> > org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) > >>>> at > >>> > org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) > >>>> at > >>> > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) > >>>> at > >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) > >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > >>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > >>>> at java.security.AccessController.doPrivileged(Native Method) > >>>> at javax.security.auth.Subject.doAs(Subject.java:396) > >>>> at > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > >>>> at org.apache.hadoop.mapred.Child.main(Child.java:262) > >>>> > >>>> 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Removing task 'attempt_201202240659_6432_r_000000_1' > >>>> 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip > >>>> task_201202240659_6432_r_000000, for tracker > >>>> 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980' > >>>> 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress: > >>>> Error from attempt_201202240659_6432_r_000000_2: java.io.IOException: > >>>> The temporary job-output directory > >>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary > >>>> doesn't exist! > >>>> at > >>> > org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) > >>>> at > >>> > org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) > >>>> at > >>> > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) > >>>> at > >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) > >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > >>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > >>>> at java.security.AccessController.doPrivileged(Native Method) > >>>> at javax.security.auth.Subject.doAs(Subject.java:396) > >>>> at > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > >>>> at org.apache.hadoop.mapred.Child.main(Child.java:262) > >>>> > >>>> 2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Removing task 'attempt_201202240659_6432_r_000000_2' > >>>> 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_3' to tip > >>>> task_201202240659_6432_r_000000, for tracker > >>>> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297' > >>>> 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress: > >>>> Error from attempt_201202240659_6432_r_000000_3: java.io.IOException: > >>>> The temporary job-output directory > >>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary > >>>> doesn't exist! > >>>> at > >>> > org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) > >>>> at > >>> > org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) > >>>> at > >>> > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) > >>>> at > >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) > >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > >>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > >>>> at java.security.AccessController.doPrivileged(Native Method) > >>>> at javax.security.auth.Subject.doAs(Subject.java:396) > >>>> at > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > >>>> at org.apache.hadoop.mapred.Child.main(Child.java:262) > >>>> > >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.TaskInProgress: > >>>> TaskInProgress task_201202240659_6432_r_000000 has failed 4 times. > >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress: > >>>> Aborting job job_201202240659_6432 > >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress: > >>>> Killing job 'job_201202240659_6432' > >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Adding task (JOB_CLEANUP) 'attempt_201202240659_6432_m_000002_0' to > >>>> tip task_201202240659_6432_m_000002, for tracker > >>>> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297' > >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Removing task 'attempt_201202240659_6432_r_000000_3' > >>>> 2012-05-02 17:01:22,443 INFO org.apache.hadoop.mapred.JobInProgress: > >>>> Task 'attempt_201202240659_6432_m_000002_0' has completed > >>>> task_201202240659_6432_m_000002 successfully. > >>>> 2012-05-02 17:01:22,443 INFO > >>>> org.apache.hadoop.mapred.JobInProgress$JobSummary: > >>>> > >>> > jobId=job_201202240659_6432,submitTime=1335990629637,launchTime=1335990629702,firstMapTaskLaunchTime=1335990631252,firstReduceTaskLaunchTime=1335990633988,firstJobSetupTaskLaunchTime=1335990629727,firstJobCleanupTaskLaunchTime=1335992481228,finishTime=1335992482443,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=thompson,queue=default,status=FAILED,mapSlotSeconds=7,reduceSlotsSeconds=32,clusterMapCapacity=138,clusterReduceCapacity=138 > >>>> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: > >>>> DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: > >>>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease > >>>> on > >>> > /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity > >>>> File does not exist. Holder DFSClient_1931891231 does not have any > >>>> open files. > >>>> at > >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542) > >>>> at > >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533) > >>>> at > >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449) > >>>> at > >>> > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649) > >>>> at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) > >>>> at > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>>> at java.lang.reflect.Method.invoke(Method.java:597) > >>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) > >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) > >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411) > >>>> at java.security.AccessController.doPrivileged(Native Method) > >>>> at javax.security.auth.Subject.doAs(Subject.java:396) > >>>> at > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > >>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409) > >>>> > >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1104) > >>>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) > >>>> at $Proxy4.addBlock(Unknown Source) > >>>> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > >>>> at > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>>> at java.lang.reflect.Method.invoke(Method.java:597) > >>>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > >>>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > >>>> at $Proxy4.addBlock(Unknown Source) > >>>> at > >>> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185) > >>>> at > >>> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055) > >>>> at > >>> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305) > >>>> at > >>> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500) > >>>> > >>>> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Error > >>>> Recovery for block null bad datanode[0] nodes == null > >>>> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Could > >>>> not get block locations. Source file > >>>> > >>> > "/thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity" > >>>> - Aborting... > >>>> 2012-05-02 17:01:22,446 INFO org.apache.hadoop.mapred.JobHistory: > >>>> Moving > >>> > file:/var/log/hadoop/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity > >>>> to file:/var/log/hadoop/history/done > >>>> 2012-05-02 17:01:22,447 ERROR org.apache.hadoop.hdfs.DFSClient: > >>>> Exception closing file > >>>> > >>> > /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity > >>>> : org.apache.hadoop.ipc.RemoteException: > >>>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease > >>>> on > >>> > /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity > >>>> File does not exist. Holder DFSClient_1931891231 does not have any > >>>> open files. > >>>> at > >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542) > >>>> at > >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533) > >>>> at > >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449) > >>>> at > >>> > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649) > >>>> at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) > >>>> at > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>>> at java.lang.reflect.Method.invoke(Method.java:597) > >>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) > >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) > >>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411) > >>>> at java.security.AccessController.doPrivileged(Native Method) > >>>> at javax.security.auth.Subject.doAs(Subject.java:396) > >>>> at > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > >>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409) > >>>> > >>>> > >>>> Can anyone explain to me what is causing the problem? > >>>> > >>>> -- > >>>> *Keith Thompson* > >>> > >> > >> > >> > >> -- > >> *Keith Thompson* > >> Graduate Research Associate, Xerox Corporation > >> SUNY Research Foundation > >> Dept. of Systems Science and Industrial Engineering > >> Binghamton University > >> Binghamton, NY 13902 > >> > >> > > -- *Keith Thompson* Graduate Research Associate, Xerox Corporation SUNY Research Foundation Dept. of Systems Science and Industrial Engineering Binghamton University Binghamton, NY 13902