[ 
https://issues.apache.org/jira/browse/KUDU-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493805#comment-15493805
 ] 

Todd Lipcon commented on KUDU-463:
----------------------------------

There are some pros and cons to doing it at a "lower level".
Pros:
- only needs to be done once, for all file formats
- automatically covers all parts of a file

Cons:
- assuming the checksums are interspersed with the real data, you need to 
"deinterleave" them on read, which usually adds an extra copy (or complex 
scatter-gather code which might not be that efficient either).
- potential read amplification if the checksum chunk size approaches the unit 
of IO (cfile block size)

I think my preference is to do it at the cfile level rather than FS, but could 
be convinced otherwise.

> Add checksumming to cfile and other on-disk formats
> ---------------------------------------------------
>
>                 Key: KUDU-463
>                 URL: https://issues.apache.org/jira/browse/KUDU-463
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: cfile, tablet
>    Affects Versions: Private Beta
>            Reporter: Todd Lipcon
>            Assignee: Adar Dembo
>              Labels: kudu-roadmap
>
> We should add CRC32C checksums to cfile blocks, metadata blocks, etc, to 
> protect against silent disk corruption. We should probably do this prior to a 
> public release, since it will likely have a negative performance impact, and 
> we don't want to have a public regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to