smengcl opened a new pull request, #7189: URL: https://github.com/apache/ozone/pull/7189
## What changes were proposed in this pull request? **Warning: POC. Unoptimized implementation. Missing test cases.** ### Problem Statement Currently by default, each 4 MB block chunk is further divided into 16 KB chunks (down from 1 MB, changed in HDDS-10465) for checksum calculation. The problem is, even with the smaller checksum chunk, clients still calculate the checksum for the whole 4 MB block chunk **from the beginning** every single time: https://github.com/apache/ozone/blob/e57370124a36315d2be5791753912901f836ccd8/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L171-L177 This PR aims to implement a checksum cache to reduce the CPU time spent in critical section for checksum calculation, in hope of greatly improving client hsync throughput when checksum is enabled. ### TODOs - [ ] Add client config key to enable client checksum cache? - [ ] Thoroughly test all code paths using `Checksum` ### Future Work There are much more improvements that can be done on optimizing checksum, such as: 1. Even finer-grained incremental checksum calculation. - For CRC32/CRC32C, the checksum can be updated on a byte-by-byte basis (rather having to calculate the entire 16 KB) - For SHA256 and MD5, the checksum can be updated every 64 bytes (512 bits). 2. Transfer only the unacknowledged checksums to the datanode. Requires proto change. But those are beyond the scope of this jira and would require major refactoring. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-10411 ## How was this patch tested? - [ ] New UT cases to be added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
