[ https://issues.apache.org/jira/browse/KUDU-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581580#comment-16581580 ]
Grant Henke commented on KUDU-2526: ----------------------------------- Looking at how we could implement this it appears the most appropriate place to validate would be in _TabletCopyClient::__DownloadBlock_ ([here|https://github.com/apache/kudu/blob/master/src/kudu/tserver/tablet_copy_client.cc#L703]). The checksums we have today are at the Cfile header, footer and data block (sometimes called page) level. We could read the entire block with the CfileReader to validate the existing checksums. But that may be more CPU intensive than we would like. We could add/breakout some utility methods to validate the checksums while minimally parsing the data to minimize the overhead. The header and footer PB will need to be parsed, but we could prevent decompressing the data blocks. Alternatively, we could add a crc32 checksum to the end of each block. That checksum could then be validated when the block is finalized. The tricky part here is that we don't have any versioning on the block format because it's not really a format. In order to support a feature flag and backwards compatibility we would likely need to add a magic byte at the start so we can identify when a checksum is included. I am leaning towards the block checksums, but I was curious if anyone has any other ideas or opinions. > Checksum and validate blocks on tablet copy > ------------------------------------------- > > Key: KUDU-2526 > URL: https://issues.apache.org/jira/browse/KUDU-2526 > Project: Kudu > Issue Type: Improvement > Components: tablet copy > Reporter: Grant Henke > Priority: Major > > In order to prevent viral corruption in the case that a leader has a corrupt > CFile, we should checksum (if needed) and verify the blocks while preforming > a tablet copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)