smengcl opened a new pull request, #7189:
URL: https://github.com/apache/ozone/pull/7189

   ## What changes were proposed in this pull request?
   
   **Warning: POC. Unoptimized implementation. Missing test cases.**
   
   ### Problem Statement
   
   Currently by default, each 4 MB block chunk is further divided into 16 KB 
chunks (down from 1 MB, changed in HDDS-10465) for checksum calculation.
   
   The problem is, even with the smaller checksum chunk, clients still 
calculate the checksum for the whole 4 MB block chunk **from the beginning** 
every single time:
   
   
https://github.com/apache/ozone/blob/e57370124a36315d2be5791753912901f836ccd8/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L171-L177
   
   This PR aims to implement a checksum cache to reduce the CPU time spent in 
critical section for checksum calculation, in hope of greatly improving client 
hsync throughput when checksum is enabled.
   
   ### TODOs
   
   - [ ] Add client config key to enable client checksum cache?
   - [ ] Thoroughly test all code paths using `Checksum`
   
   ### Future Work
   
   There are much more improvements that can be done on optimizing checksum, 
such as:
   1. Even finer-grained incremental checksum calculation.
     - For CRC32/CRC32C, the checksum can be updated on a byte-by-byte basis 
(rather having to calculate the entire 16 KB)
     - For SHA256 and MD5, the checksum can be updated every 64 bytes (512 
bits).
   2. Transfer only the unacknowledged checksums to the datanode. Requires 
proto change.
   
   But those are beyond the scope of this jira and would require major 
refactoring.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-10411
   
   ## How was this patch tested?
   
   - [ ] New UT cases to be added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to