[ https://issues.apache.org/jira/browse/HADOOP-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas resolved HADOOP-5903. ----------------------------------- Resolution: Duplicate Duplicate of HADOOP-3185, HADOOP-4681 > DFSClient "Could not obtain block:..." > -------------------------------------- > > Key: HADOOP-5903 > URL: https://issues.apache.org/jira/browse/HADOOP-5903 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.18.3, 0.19.0, 0.19.1, 0.20.0 > Reporter: stack > > We see this frequently in our application, hbase, where dfsclients are held > open across long periods of time. It would seem that any hiccup fetching a > block becomes a permanent black mark and though the serving datanode passes > out a temporary slowness or outage, the dfsclient never seems to pick up on > this fact. Our perception is too sensitive to the vagaries of cluster > comings and goings and succumbs too easily, especially given that a fresh > dfsclient has not problem fetching the designated block. > Chatting with Raghu and Hairong yesterday, Hairong pointed out that the > dfsclient frequently updates its list of block locations -- if a block has > moved or if a datanode is dead, then dfsclient should be keeping with the > changing state of the cluster (I see this happening in > DFSClient#chooseDatanode on failure) but Raghu looks like he put his finger > on our problem by noticing that the failures count is only incremented -- > never decremented. ANY three failures, no matter how many blocks in a file > nor that a block that failed once now works, are enough for the DFSClient to > start throwing "Could not obtain block:...". > The failures counter needs to be a little smarter. Would a patch that adds a > map of blocks to failure counts be the right way to go? Failures should note > the datanode that the failure was gotten against so that if the datanode came > online again (retry), we could decrement the mark that had made against the > block? > What do folks think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.