[ https://issues.apache.org/jira/browse/HDFS-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140609#comment-13140609 ]
Todd Lipcon commented on HDFS-2130: ----------------------------------- Turns out this is actually fairly difficult. The reason is that the checksumming is done at the DFSOutputStream layer, rather than the DataStreamer layer. So, the checksum algorithm and chunk size needs to be known _before_ the outputstream connects to the datanode. Here are a few possible solutions: 1) When append() is called, make an RPC to the datanode hosting the last block of the file. This RPC will read the meta header and return the correct checksum. The DFSOutputStream then adopts that checksum. Advantages: - fairly simple to implement. - Allows switching checksum type *and* chunk size. Disadvantages: - extra round-trip to set up the pipeline for append. 2) In the case of append, the DN can allow a writer to use a different checksum _algorithm_ so long as the chunk size and checksum size are the same. In this case, it will verify the incoming packets using the writer's algorithm, then re-checksum them using the disk algorithm before writing to the meta file. Advantages: - no extra round-trip on pipeline creation. - no need to change client code. - when the client transitions to the next block in a file being appended, the new (preferred) checksum is used. Disadvantages: - There's a slight performance hit when filling up the last block of a file being appended. - Not a general solution (only supports changing polynomial, not chunk size) Any other ideas? > Switch default checksum to CRC32C > --------------------------------- > > Key: HDFS-2130 > URL: https://issues.apache.org/jira/browse/HDFS-2130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client > Reporter: Todd Lipcon > > Once the other subtasks/parts of HDFS-2080 are complete, CRC32C will be a > much more efficient checksum algorithm than CRC32. Hence we should change the > default checksum to CRC32C. > However, in order to continue to support append against blocks created with > the old checksum, we will need to implement some kind of handshaking in the > write pipeline. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira