[ 
https://issues.apache.org/jira/browse/HDDS-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-11595:
------------------------------
    Description: 
>From the way it looks currently {{BOS#WriteChunk}} always sends the entire 
>client chunk buffer to the DataNode every single time it is called (e.g. 
>during hsync). That is quite inefficient.

 

{{data}} here is the entire client key block chunk buffer, rather than just the 
newly written (not flushed to DNs) part (which is the ideal way):

{code:java|title=https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951}
asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
    blockID.get(), data, tokenString, replicationIndex, blockData, close); 
{code}
It looks like the inefficient transfer is for (the ease of) DataNode-side 
checksum verification for the most part:

[https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]

 

As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize 
the chunk offset (offset stored in ChunkInfo):

[https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]

Need to double check to make sure it is indeed not overwriting the existing 
chunk (which is futile).

 

Goal: Get rid of such duplicate transfer.

 

Note: If we need a new protocol for this, it will likely be a new protocol 
rather than the existing WriteChunkProto.

  was:
>From the way it looks currently {{BOS#WriteChunk}} always sends the entire 
>client chunk buffer to the DataNode every single time it is called (e.g. 
>during hsync). That is quite inefficient.

 

{{data}} here is the entire client key block chunk buffer, rather than just the 
newly written (not flushed to DNs) part (which is the ideal way):

[https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951]
{code:java}
asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
    blockID.get(), data, tokenString, replicationIndex, blockData, close); 
{code}
It looks like the inefficient transfer is for (the ease of) DataNode-side 
checksum verification for the most part:

[https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]

 

As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize 
the chunk offset (offset stored in ChunkInfo):

[https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]

Need to double check to make sure it is indeed not overwriting the existing 
chunk (which is futile).

 

Goal: Get rid of such duplicate transfer.

 

Note: If we need a new protocol for this, it will likely be a new protocol 
rather than the existing WriteChunkProto.


> [hsync] Improve WriteChunk to send incremental data only
> --------------------------------------------------------
>
>                 Key: HDDS-11595
>                 URL: https://issues.apache.org/jira/browse/HDDS-11595
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Siyao Meng
>            Assignee: Siyao Meng
>            Priority: Major
>
> From the way it looks currently {{BOS#WriteChunk}} always sends the entire 
> client chunk buffer to the DataNode every single time it is called (e.g. 
> during hsync). That is quite inefficient.
>  
> {{data}} here is the entire client key block chunk buffer, rather than just 
> the newly written (not flushed to DNs) part (which is the ideal way):
> {code:java|title=https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951}
> asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
>     blockID.get(), data, tokenString, replicationIndex, blockData, close); 
> {code}
> It looks like the inefficient transfer is for (the ease of) DataNode-side 
> checksum verification for the most part:
> [https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]
>  
> As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize 
> the chunk offset (offset stored in ChunkInfo):
> [https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]
> Need to double check to make sure it is indeed not overwriting the existing 
> chunk (which is futile).
>  
> Goal: Get rid of such duplicate transfer.
>  
> Note: If we need a new protocol for this, it will likely be a new protocol 
> rather than the existing WriteChunkProto.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to