[ https://issues.apache.org/jira/browse/HADOOP-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645288#action_12645288 ]
Raghu Angadi commented on HADOOP-3914: -------------------------------------- bq. We applied this patch, on a machine that was failing to read a directory of files (reads of the individual files were fine) hadoop dfs -text path_to_directory/'*' You need HADOOP-4499. > checksumOk implementation in DFSClient can break applications > ------------------------------------------------------------- > > Key: HADOOP-3914 > URL: https://issues.apache.org/jira/browse/HADOOP-3914 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.1 > Reporter: Christian Kunz > Assignee: Christian Kunz > Priority: Blocker > Fix For: 0.18.2 > > Attachments: checksumOk.patch, checksumOk1-br18.patch, > checksumOk1.patch, patch.HADOOP-3914 > > > One of our non-map-reduce applications (written in C and using libhdfs to > access dfs) stopped working after switch from 0.16 to 0.17. > The problem was finally traced down to failures in checksumOk. > I would assume, the purpose of checksumOk is for a DfsClient to indicate to a > sending Datanode that the checksum of the received block is okay. This must > be useful in the replication pipeline. > How checksumOk is implemented is that any IOException is caught and ignored, > probably because it is not essential for the client that the message is > successful. > But it proved fatal for our application because this application links in a > 3rd-party library which seems to catch socket exceptions before libhdfs. > Why was there an Exception? In our case the application reads a file into the > local buffer of the DFSInputStream large enough to hold all data, the > application reads to the end and the checksumOK is sent successfully at that > time. But then the application does some other stuff and comes back to > re-read the file (still open). It is then when it calls checksumOk again and > crashes. > It can easily be avoided by adding a Boolean making sure that checksumOk is > called exactly once when EOS is encountered. Redundant calls to checksumOk do > not seem to make sense anyhow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.