[
https://issues.apache.org/jira/browse/HDDS-11077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Rose reassigned HDDS-11077:
---------------------------------
Assignee: Ethan Rose (was: Ritesh Shukla)
> Optimize checksum calculations in container merkle tree
> -------------------------------------------------------
>
> Key: HDDS-11077
> URL: https://issues.apache.org/jira/browse/HDDS-11077
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Ethan Rose
> Priority: Major
>
> *Choosing an Implementation*
> There are two main places we can get our checksum implementations from:
> * {{java.util.zip.CRC32[C\]}} which use native code.
> * {{PureJavaCrc32[C\]}} which has implementations in Ozone, Hadoop, and
> Apache Commons that are all more or less copied from each other.
> The considerations in choosing an implementation are:
> * CRC32C is a general improvement over CRC32.
> * {{java.util.zip.CRC32C}} does not exist until Java 9. Java 8 only has
> {{CRC32}}.
> * {{java.util.Checksum#update(ByteBuffer)}} does not exist until Java 9. This
> is why Ozone has the {{ChecksumByteBuffer}} wrapper class.
> Previous work to determine which checksum to use on data in Ozone was done
> [here|https://github.com/apache/ozone/pull/1910#issuecomment-775165462] and
> [here|https://github.com/apache/ozone/pull/1950]. These links explain the
> decision to default to {{java.util.zip.CRC32}} in Ozone. They also implement
> the ability to swap between {{PureJavaCrc32C}} and {{java.util.zip.CRC32C}}
> when CRC32C is specified based on the Java version.
> *Choosing an update method*
> It looks like {{java.util.Checksum#update(int)}} only reads the first byte
> out of the int. This is based on the [Java 9 javadoc for
> CRC32C|https://docs.oracle.com/javase%2F9%2Fdocs%2Fapi%2F%2F/java/util/zip/CRC32C.html#update-int-].
> Other implementations do not specify whether the whole int is read or not.
> Since this is a single byte put, I'm not sure this is any better than using a
> byte buffer/array to either roll the longs into the checksum one by one, or
> batch the checksum computation on a buffer of all the longs under a tree node.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]