[jira] [Commented] (HDDS-11595) [hsync] Improve WriteChunk to send incremental data only

Wei-Chiu Chuang (Jira) Mon, 21 Oct 2024 10:38:09 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891615#comment-17891615
 ]


Wei-Chiu Chuang commented on HDDS-11595:
----------------------------------------

I think it makes sense to have a new proto field for this. We've overloaded the 
WriteChunk too much that I feel like it's hard to reason about the code logic.

To do so, there are several places in the ContainerStateMachine where it deals 
with WriteChunk differently so you'd need to do so as well 
https://github.com/apache/ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L394
 
also KeyValueHandler would need to distinguish between WRITE and COMMIT stages 
https://github.com/apache/ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L887
 

> [hsync] Improve WriteChunk to send incremental data only
> --------------------------------------------------------
>
>                 Key: HDDS-11595
>                 URL: https://issues.apache.org/jira/browse/HDDS-11595
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Siyao Meng
>            Assignee: Siyao Meng
>            Priority: Major
>
> From the way it looks currently {{BOS#WriteChunk}} always sends the entire 
> client chunk buffer to the DataNode every single time it is called (e.g. 
> during hsync). That is quite inefficient.
>  
> {{data}} here is the entire client key block chunk buffer, rather than just 
> the newly written (not flushed to DNs) part (which is the ideal way):
> {code:java|title=https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951}
> asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
>     blockID.get(), data, tokenString, replicationIndex, blockData, close); 
> {code}
> It looks like the inefficient transfer is for (the ease of) DataNode-side 
> checksum verification for the most part:
> [https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]
>  
> As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize 
> the chunk offset (offset stored in ChunkInfo):
> [https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]
> Need to double check to make sure it is indeed *not overwriting* the existing 
> chunk (which is fine but futile).
>  
> Goal: Get rid of such duplicate transfer.
>  
> Note: If we need a new protocol for this, it will likely be a new protocol 
> rather than the existing WriteChunkProto.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-11595) [hsync] Improve WriteChunk to send incremental data only

Reply via email to