[
https://issues.apache.org/jira/browse/HADOOP-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amir Youssefi updated HADOOP-3357:
----------------------------------
Description:
I had a case where the JobTracker was trying to delete some files, as part of
Garbage Collect for a job, in a dfs directory. The thread hung and this is the
trace:
Thread 19 (IPC Server handler 5 on 57344):
State: WAITING
Blocked count: 137022
Waited count: 336004
Waiting on [EMAIL PROTECTED]
Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
org.apache.hadoop.ipc.Client.call(Client.java:683)
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:515)
org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:170)
org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:118)
org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:114)
org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1635)
org.apache.hadoop.mapred.JobInProgress.isJobComplete(JobInProgress.java:1387)
org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:1348)
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:565)
org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:2032)
and it hung for an enormously long amount of time ~1 hour.
Not sure whether these will help:
I saw this message in the NameNode log around the time the delete was issued by
the JobTracker
2008-05-07 09:55:57,375 WARN org.apache.hadoop.dfs.StateChange: DIR*
FSDirectory.unprotectedDelete: failed to remove
/mapredsystem/ddas/mapredsystem/10091.{running.machine.com}/job_200805070458_0004
because it does not exist
I also checked that the directory in question was actually there (and the job
couldn't have run without this directory being there).
was:
I had a case where the JobTracker was trying to delete some files, as part of
Garbage Collect for a job, in a dfs directory. The thread hung and this is the
trace:
Thread 19 (IPC Server handler 5 on 57344):
State: WAITING
Blocked count: 137022
Waited count: 336004
Waiting on [EMAIL PROTECTED]
Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
org.apache.hadoop.ipc.Client.call(Client.java:683)
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:515)
org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:170)
org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:118)
org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:114)
org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1635)
org.apache.hadoop.mapred.JobInProgress.isJobComplete(JobInProgress.java:1387)
org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:1348)
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:565)
org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:2032)
and it hung for an enormously long amount of time ~1 hour.
Not sure whether these will help:
I saw this message in the NameNode log around the time the delete was issued by
the JobTracker
2008-05-07 09:55:57,375 WARN org.apache.hadoop.dfs.StateChange: DIR*
FSDirectory.unprotectedDelete: failed to remove
/mapredsystem/ddas/mapredsystem/10091.gs301249.inktomisearch.com/job_200805070458_0004
because it does not exist
I also checked that the directory in question was actually there (and the job
couldn't have run without this directory being there).
> delete on dfs hung
> ------------------
>
> Key: HADOOP-3357
> URL: https://issues.apache.org/jira/browse/HADOOP-3357
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Devaraj Das
>
> I had a case where the JobTracker was trying to delete some files, as part of
> Garbage Collect for a job, in a dfs directory. The thread hung and this is
> the trace:
> Thread 19 (IPC Server handler 5 on 57344):
> State: WAITING
> Blocked count: 137022
> Waited count: 336004
> Waiting on [EMAIL PROTECTED]
> Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:683)
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
> sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
> org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:515)
>
> org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:170)
> org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:118)
> org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:114)
>
> org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1635)
>
> org.apache.hadoop.mapred.JobInProgress.isJobComplete(JobInProgress.java:1387)
>
> org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:1348)
>
> org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:565)
>
> org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:2032)
> and it hung for an enormously long amount of time ~1 hour.
> Not sure whether these will help:
> I saw this message in the NameNode log around the time the delete was issued
> by the JobTracker
> 2008-05-07 09:55:57,375 WARN org.apache.hadoop.dfs.StateChange: DIR*
> FSDirectory.unprotectedDelete: failed to remove
> /mapredsystem/ddas/mapredsystem/10091.{running.machine.com}/job_200805070458_0004
> because it does not exist
> I also checked that the directory in question was actually there (and the job
> couldn't have run without this directory being there).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.