[
https://issues.apache.org/jira/browse/HDDS-10411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820213#comment-17820213
]
Wei-Chiu Chuang commented on HDDS-10411:
----------------------------------------
Another thing we're thinking now is to reduce checksum chunk size from 1MB to a
lower number, like 64KB.
(1) The chunk size was determined years ago to be optimal. However, a lot has
change especially given the incremental chunk list implementation, the metadata
size associated with smaller chunk size is no longer a huge issue.
(2) Big checksum chunk size makes it more likely to be irrecoverable.
(3) The input stream suffer from GC and memory allocation problem associated
with bigger buffer required for the chunks.
(4) For the case of HBase, it only reads 64KB each time. Having a big input
buffer makes no sense.
We should revisit the chunk size later.
> Support incremental ChunkBuffer checksum calculation
> ----------------------------------------------------
>
> Key: HDDS-10411
> URL: https://issues.apache.org/jira/browse/HDDS-10411
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Siyao Meng
> Assignee: Siyao Meng
> Priority: Major
>
> h2. Goal
> Calculate ChunkBuffer (ByteBuffer) checksum incrementally rather than having
> to calculating it from scratch every single time in {{writeChunkToContainer}}.
> h2. Background
> Currently ChunkBuffer (ByteBuffer) checksum is always calculated from
> scratch. As can be seen here in checksum function initialization, which it
> always calls {{reset()}} before feeding any data with {{update()}}:
> {code:title=newChecksumByteBufferFunction:
> https://github.com/apache/ozone/blob/5f0925e190f1dbf2d4617daad8ad42401d5079e1/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L67-L68}
> private static Function<ByteBuffer, ByteString>
> newChecksumByteBufferFunction(
> Supplier<ChecksumByteBuffer> constructor) {
> final ChecksumByteBuffer algorithm = constructor.get();
> return data -> {
> algorithm.reset();
> algorithm.update(data);
> return int2ByteString((int)algorithm.getValue());
> };
> }
> {code}
> Each ByteBuffer (4 MB by default) inside a block's ChunkBuffer gets its
> checksum calculated here:
> {code:title=https://github.com/apache/ozone/blob/5f0925e190f1dbf2d4617daad8ad42401d5079e1/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L171-L177}
> // Checksum is computed for each bytesPerChecksum number of bytes of data
> // starting at offset 0. The last checksum might be computed for the
> // remaining data with length less than bytesPerChecksum.
> final List<ByteString> checksumList = new ArrayList<>();
> for (ByteBuffer b : data.iterate(bytesPerChecksum)) {
> checksumList.add(computeChecksum(b, function, bytesPerChecksum));
> }
> {code}
> which is called from
> [{{BlockOutputStream#writeChunkToContainer}}|https://github.com/apache/ozone/blob/f0b75b7e4ee93e89f9e4fc96cb30d59f78746eb5/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L697].
> And when the function is applied in the inner {{computeChecksum}}, it always
> calls {{reset()}} first. So it calculates the whole ByteBuffer from offset 0.
> h2. Motivation
> While this may not be a big issue before Ozone {{hsync()}} is implemented (or
> in HDFS, where each chunk is much smaller, at 64 KB by default), it can now
> contribute to ~10% of hsync latency between client-DN if the client is only
> appending a few bytes between hsyncs, as can be seen from [~weichiu]'s flame
> graph.
> Estimated latency improvement is 0%~20% with this change, depending on the
> client write/hsync pattern.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]