[
https://issues.apache.org/jira/browse/KUDU-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493805#comment-15493805
]
Todd Lipcon commented on KUDU-463:
----------------------------------
There are some pros and cons to doing it at a "lower level".
Pros:
- only needs to be done once, for all file formats
- automatically covers all parts of a file
Cons:
- assuming the checksums are interspersed with the real data, you need to
"deinterleave" them on read, which usually adds an extra copy (or complex
scatter-gather code which might not be that efficient either).
- potential read amplification if the checksum chunk size approaches the unit
of IO (cfile block size)
I think my preference is to do it at the cfile level rather than FS, but could
be convinced otherwise.
> Add checksumming to cfile and other on-disk formats
> ---------------------------------------------------
>
> Key: KUDU-463
> URL: https://issues.apache.org/jira/browse/KUDU-463
> Project: Kudu
> Issue Type: Sub-task
> Components: cfile, tablet
> Affects Versions: Private Beta
> Reporter: Todd Lipcon
> Assignee: Adar Dembo
> Labels: kudu-roadmap
>
> We should add CRC32C checksums to cfile blocks, metadata blocks, etc, to
> protect against silent disk corruption. We should probably do this prior to a
> public release, since it will likely have a negative performance impact, and
> we don't want to have a public regression.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)