[jira] Resolved: (HADOOP-5903) DFSClient "Could not obtain block:..."

Chris Douglas (JIRA) Sat, 23 May 2009 20:23:10 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Douglas resolved HADOOP-5903.
-----------------------------------

    Resolution: Duplicate

Duplicate of HADOOP-3185, HADOOP-4681

> DFSClient "Could not obtain block:..."
> --------------------------------------
>
>                 Key: HADOOP-5903
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5903
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.3, 0.19.0, 0.19.1, 0.20.0
>            Reporter: stack
>
> We see this frequently in our application, hbase, where dfsclients are held 
> open across long periods of time. It would seem that any hiccup fetching a 
> block becomes a permanent black mark and though the serving datanode passes 
> out a temporary slowness or outage, the dfsclient never seems to pick up on 
> this fact.  Our perception is too sensitive to the vagaries of cluster 
> comings and goings and succumbs too easily, especially given that a fresh 
> dfsclient has not problem fetching the designated block.
> Chatting with Raghu and Hairong yesterday, Hairong pointed out that the 
> dfsclient frequently updates its list of block locations -- if a block has 
> moved or if a datanode is dead, then dfsclient should be keeping with the 
> changing state of the cluster (I see this happening in 
> DFSClient#chooseDatanode on failure) but Raghu looks like he put his finger 
> on our problem by noticing that the failures count is only incremented -- 
> never decremented.  ANY three failures, no matter how many blocks in a file 
> nor that a block that failed once now works, are enough for the DFSClient to 
> start throwing "Could not obtain block:...".
> The failures counter needs to be a little smarter.  Would a patch that adds a 
> map of blocks to failure counts be the right way to go?  Failures should note 
> the datanode that the failure was gotten against so that if the datanode came 
> online again (retry), we could decrement the mark that had made against the 
> block?
> What do folks think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-5903) DFSClient "Could not obtain block:..."

Reply via email to