As we make more buffering improvements in Hadoop (HADOOP-3065,
HADOOP-1702 etc), the relative cost of checksums keeps growing.
+1 for looking into better checksums.
Raghu.
Raghu Angadi wrote:
Datanodes and persistent storage can deal with different checksums. But
client does not support it yet (easier to fix since it is not tied to
persistent data).
Regd CPU comparisions, most reliable I found is to test with either by
maxing out CPU on a machine and comparing the time taken, or comparing
cpu reported in /proc/pid/stat. see
https://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553
) for e.g.
Raghu.