[ https://issues.apache.org/jira/browse/HADOOP-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641627#action_12641627 ]
Hairong Kuang commented on HADOOP-3914: --------------------------------------- It looks that Hudson does not pick up available patches again. I've run the test on my local machine. $ ant test-core .. BUILD SUCCESSFUL Total time: 106 minutes 33 seconds $ ant test-patch [exec] -1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] -1 tests included. The patch doesn't appear to include anynew or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugswarnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. Since the major fix of this patch has already been running on production clusters, a unit test is intentionally exempted. > checksumOk implementation in DFSClient can break applications > ------------------------------------------------------------- > > Key: HADOOP-3914 > URL: https://issues.apache.org/jira/browse/HADOOP-3914 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.1 > Reporter: Christian Kunz > Assignee: Christian Kunz > Priority: Blocker > Fix For: 0.19.0 > > Attachments: checksumOk.patch, checksumOk1.patch, patch.HADOOP-3914 > > > One of our non-map-reduce applications (written in C and using libhdfs to > access dfs) stopped working after switch from 0.16 to 0.17. > The problem was finally traced down to failures in checksumOk. > I would assume, the purpose of checksumOk is for a DfsClient to indicate to a > sending Datanode that the checksum of the received block is okay. This must > be useful in the replication pipeline. > How checksumOk is implemented is that any IOException is caught and ignored, > probably because it is not essential for the client that the message is > successful. > But it proved fatal for our application because this application links in a > 3rd-party library which seems to catch socket exceptions before libhdfs. > Why was there an Exception? In our case the application reads a file into the > local buffer of the DFSInputStream large enough to hold all data, the > application reads to the end and the checksumOK is sent successfully at that > time. But then the application does some other stuff and comes back to > re-read the file (still open). It is then when it calls checksumOk again and > crashes. > It can easily be avoided by adding a Boolean making sure that checksumOk is > called exactly once when EOS is encountered. Redundant calls to checksumOk do > not seem to make sense anyhow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.