[ https://issues.apache.org/jira/browse/HADOOP-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi updated HADOOP-1471: --------------------------------- Component/s: dfs Fix Version/s: 0.14.0 Description: Patch submitted to HADOOP-893 (by me :( ) seemhave a bug in how it deals with the set {{deadNodes}}. After the patch, the {{seekToNewSource()}} looks like this : {code} public synchronized boolean seekToNewSource(long targetPos) throws IOException { boolean markedDead = deadNodes.contains(currentNode); deadNodes.add(currentNode); DatanodeInfo oldNode = currentNode; DatanodeInfo newNode = blockSeekTo(targetPos); if (!markedDead) { /* remove it from deadNodes. blockSeekTo could have cleared * deadNodes and added currentNode again. Thats ok. */ deadNodes.remove(oldNode); } // ... {code} I guess with the expectation that caller of this function decides before the call whether to put the node in {{deadNodes}} or not. I am not sure whether this was a bug then or not but it certainly seems to be bug now. i.e. when there is a checksum error with replica1, we try replica2 and if there a checksum error again, then we try replica1 again! Note that ChecksumFileSystem.java was created after HADOOP-893 was resolved. was: Patch submitted to HADOOP-893 (by me :( ) seemhave a bug in how it deals with the set {{deadNodes}}. After the patch, the {{seekToNewSource()}} looks like this : {code} public synchronized boolean seekToNewSource(long targetPos) throws IOException { boolean markedDead = deadNodes.contains(currentNode); deadNodes.add(currentNode); DatanodeInfo oldNode = currentNode; DatanodeInfo newNode = blockSeekTo(targetPos); if (!markedDead) { /* remove it from deadNodes. blockSeekTo could have cleared * deadNodes and added currentNode again. Thats ok. */ deadNodes.remove(oldNode); } // ... {code} I guess with the expectation that caller of this function decides before the call whether to put the node in {{deadNodes}} or not. I am not sure whether this was a bug then or not but it certainly seems to be bug now. i.e. when there is a checksum error with replica1, we try replica2 and if there a checksum error again, then we try replica1 again! Note that ChecksumFileSystem.java was created after HADOOP-893 was resolved. Affects Version/s: 0.12.3 > seekToNewSource() might not work correctly with Checksum failures. > ------------------------------------------------------------------ > > Key: HADOOP-1471 > URL: https://issues.apache.org/jira/browse/HADOOP-1471 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.12.3 > Reporter: Raghu Angadi > Fix For: 0.14.0 > > > Patch submitted to HADOOP-893 (by me :( ) seemhave a bug in how it deals > with the set {{deadNodes}}. After the patch, the {{seekToNewSource()}} looks > like this : > {code} > public synchronized boolean seekToNewSource(long targetPos) throws > IOException { > boolean markedDead = deadNodes.contains(currentNode); > deadNodes.add(currentNode); > DatanodeInfo oldNode = currentNode; > DatanodeInfo newNode = blockSeekTo(targetPos); > if (!markedDead) { > /* remove it from deadNodes. blockSeekTo could have cleared > * deadNodes and added currentNode again. Thats ok. */ > deadNodes.remove(oldNode); > } > // ... > {code} > I guess with the expectation that caller of this function decides before the > call whether to put the node in {{deadNodes}} or not. I am not sure whether > this was a bug then or not but it certainly seems to be bug now. i.e. when > there is a checksum error with replica1, we try replica2 and if there a > checksum error again, then we try replica1 again! > Note that ChecksumFileSystem.java was created after HADOOP-893 was resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.