[
https://issues.apache.org/jira/browse/HADOOP-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Douglas resolved HADOOP-5903.
-----------------------------------
Resolution: Duplicate
Duplicate of HADOOP-3185, HADOOP-4681
> DFSClient "Could not obtain block:..."
> --------------------------------------
>
> Key: HADOOP-5903
> URL: https://issues.apache.org/jira/browse/HADOOP-5903
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.3, 0.19.0, 0.19.1, 0.20.0
> Reporter: stack
>
> We see this frequently in our application, hbase, where dfsclients are held
> open across long periods of time. It would seem that any hiccup fetching a
> block becomes a permanent black mark and though the serving datanode passes
> out a temporary slowness or outage, the dfsclient never seems to pick up on
> this fact. Our perception is too sensitive to the vagaries of cluster
> comings and goings and succumbs too easily, especially given that a fresh
> dfsclient has not problem fetching the designated block.
> Chatting with Raghu and Hairong yesterday, Hairong pointed out that the
> dfsclient frequently updates its list of block locations -- if a block has
> moved or if a datanode is dead, then dfsclient should be keeping with the
> changing state of the cluster (I see this happening in
> DFSClient#chooseDatanode on failure) but Raghu looks like he put his finger
> on our problem by noticing that the failures count is only incremented --
> never decremented. ANY three failures, no matter how many blocks in a file
> nor that a block that failed once now works, are enough for the DFSClient to
> start throwing "Could not obtain block:...".
> The failures counter needs to be a little smarter. Would a patch that adds a
> map of blocks to failure counts be the right way to go? Failures should note
> the datanode that the failure was gotten against so that if the datanode came
> online again (retry), we could decrement the mark that had made against the
> block?
> What do folks think?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.