[
https://issues.apache.org/jira/browse/HADOOP-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676504#action_12676504
]
Hemanth Yamijala commented on HADOOP-5286:
------------------------------------------
A few questions that we were asked offline. I'm posting the comments here:
bq. What is start and end time of the 1.5 hour wait?
bq. Did the client read() blocked for 1.5 hours or is that it failed to read x
bytes in 1.5 hours? I understand from clients point of view they might be same,
but for HDFS, these two are different.
I think your last point is relevant. Let me try and describe this a bit better.
I don't think a single call was blocked. i.e. a single read did not block for
1.5 hours. From the attached log, and the relevant source code, I see reads are
made at 3 different places:
bq. [2009-02-19 10:03:29] at
org.apache.hadoop.mapred.JobClient$RawSplit.readFields(JobClient.java:983)
bq. [2009-02-19 10:19:46] at
org.apache.hadoop.mapred.JobClient$RawSplit.readFields(JobClient.java:981)
bq. [2009-02-19 10:26:00,504] at
org.apache.hadoop.mapred.JobClient$RawSplit.readFields(JobClient.java:987)
Each of these calls result in a
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1680)
ultimately. And these logs come one after the other. So, the first call went
from 10:03 to 10:19, and so on. Also, I don't think any call actually failed in
the end. Because from the code (and line numbers), the code is progressing to
make other read calls. Had there been an IOException, it would have bailed out
right then. So, I believe the reads were happening very slowly.
> DFS client blocked for a long time reading blocks of a file on the JobTracker
> -----------------------------------------------------------------------------
>
> Key: HADOOP-5286
> URL: https://issues.apache.org/jira/browse/HADOOP-5286
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.20.0
> Reporter: Hemanth Yamijala
> Priority: Blocker
> Attachments: jt-log-for-blocked-reads.txt
>
>
> On a large cluster, we've observed that DFS client was blocked on reading a
> block of a file for almost 1 and half hours. The file was being read by the
> JobTracker of the cluster, and was a split file of a job. On the NameNode
> logs, we observed that the block had a message as follows:
> Inconsistent size for block blk_2044238107768440002_840946 reported from
> <ip>:<port> current size is 195072 reported size is 1318567
> Details follow.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.