[jira] [Updated] (HDDS-11595) [hsync] Improve WriteChunk to send incremental data only

Siyao Meng (Jira) Wed, 30 Oct 2024 00:35:24 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Siyao Meng updated HDDS-11595:
------------------------------
    Description: 
Update:

Upon closer inspection, {{chunk/data}} passed into 
{{BOS#writeChunkToContainer}} is in fact not the whole chunk buffer from the 
beginning (default 4 MB max), but indeed an incremental one which is reset 
every time a flush happens:

[https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L747-L749]

Other points of the protocol being inefficient may still be valid. Closing this 
jira.
----
Original description (for reference only):

>From the way it looks currently {{BOS#WriteChunk}} always sends the entire 
>client chunk buffer to the DataNode every single time it is called (e.g. 
>during hsync). That is quite inefficient.

{{data}} here is the entire client key block chunk buffer, rather than just the 
newly written (not flushed to DNs) part (which is the ideal way):
{code:java}
 {code}
[https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951]
{code:java}
asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
    blockID.get(), data, tokenString, replicationIndex, blockData, close); 
{code}
It looks like the inefficient transfer is for (the ease of) DataNode-side 
checksum verification for the most part:

[https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]

As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize 
the chunk offset (offset stored in ChunkInfo):

[https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]

Need to double check to make sure it is indeed *not overwriting* the existing 
chunk (which is fine but futile).

Goal: Get rid of such duplicate transfer.

Note: If we need a new protocol for this, it will likely be a new protocol 
rather than the existing WriteChunkProto.

  was:
Update: Upon closer inspection, {{chunk/data}} passed into 
{{BOS#writeChunkToContainer}} is in fact not the whole chunk buffer from the 
beginning (default 4 MB max), but indeed an incremental one which is reset 
every time a flush happens:

[https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L747-L749]

Other points of the protocol being inefficient may still be valid. Closing this 
jira.
----
Original description (for reference only):

>From the way it looks currently {{BOS#WriteChunk}} always sends the entire 
>client chunk buffer to the DataNode every single time it is called (e.g. 
>during hsync). That is quite inefficient.

{{data}} here is the entire client key block chunk buffer, rather than just the 
newly written (not flushed to DNs) part (which is the ideal way):
{code:java}
 {code}
[https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951]
{code:java}
asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
    blockID.get(), data, tokenString, replicationIndex, blockData, close); 
{code}
It looks like the inefficient transfer is for (the ease of) DataNode-side 
checksum verification for the most part:

[https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]

As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize 
the chunk offset (offset stored in ChunkInfo):

[https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]

Need to double check to make sure it is indeed *not overwriting* the existing 
chunk (which is fine but futile).

Goal: Get rid of such duplicate transfer.

Note: If we need a new protocol for this, it will likely be a new protocol 
rather than the existing WriteChunkProto.


> [hsync] Improve WriteChunk to send incremental data only
> --------------------------------------------------------
>
>                 Key: HDDS-11595
>                 URL: https://issues.apache.org/jira/browse/HDDS-11595
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Siyao Meng
>            Assignee: Siyao Meng
>            Priority: Major
>
> Update:
> Upon closer inspection, {{chunk/data}} passed into 
> {{BOS#writeChunkToContainer}} is in fact not the whole chunk buffer from the 
> beginning (default 4 MB max), but indeed an incremental one which is reset 
> every time a flush happens:
> [https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L747-L749]
> Other points of the protocol being inefficient may still be valid. Closing 
> this jira.
> ----
> Original description (for reference only):
> From the way it looks currently {{BOS#WriteChunk}} always sends the entire 
> client chunk buffer to the DataNode every single time it is called (e.g. 
> during hsync). That is quite inefficient.
> {{data}} here is the entire client key block chunk buffer, rather than just 
> the newly written (not flushed to DNs) part (which is the ideal way):
> {code:java}
>  {code}
> [https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951]
> {code:java}
> asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
>     blockID.get(), data, tokenString, replicationIndex, blockData, close); 
> {code}
> It looks like the inefficient transfer is for (the ease of) DataNode-side 
> checksum verification for the most part:
> [https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]
> As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize 
> the chunk offset (offset stored in ChunkInfo):
> [https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]
> Need to double check to make sure it is indeed *not overwriting* the existing 
> chunk (which is fine but futile).
> Goal: Get rid of such duplicate transfer.
> Note: If we need a new protocol for this, it will likely be a new protocol 
> rather than the existing WriteChunkProto.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-11595) [hsync] Improve WriteChunk to send incremental data only

Reply via email to