[
https://issues.apache.org/jira/browse/HDDS-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siyao Meng updated HDDS-11595:
------------------------------
Description:
Update:
Upon closer inspection, {{chunk/data}} passed into
{{BOS#writeChunkToContainer}} is in fact not the whole chunk buffer from the
beginning (default 4 MB max), but indeed an incremental one which is reset
every time a flush happens:
[https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L747-L749]
Other points of the protocol being inefficient may still be valid. Closing this
jira.
----
Original description (for reference only):
>From the way it looks currently {{BOS#WriteChunk}} always sends the entire
>client chunk buffer to the DataNode every single time it is called (e.g.
>during hsync). That is quite inefficient.
{{data}} here is the entire client key block chunk buffer, rather than just the
newly written (not flushed to DNs) part (which is the ideal way):
{code:java}
{code}
[https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951]
{code:java}
asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
blockID.get(), data, tokenString, replicationIndex, blockData, close);
{code}
It looks like the inefficient transfer is for (the ease of) DataNode-side
checksum verification for the most part:
[https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]
As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize
the chunk offset (offset stored in ChunkInfo):
[https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]
Need to double check to make sure it is indeed *not overwriting* the existing
chunk (which is fine but futile).
Goal: Get rid of such duplicate transfer.
Note: If we need a new protocol for this, it will likely be a new protocol
rather than the existing WriteChunkProto.
was:
Update: Upon closer inspection, {{chunk/data}} passed into
{{BOS#writeChunkToContainer}} is in fact not the whole chunk buffer from the
beginning (default 4 MB max), but indeed an incremental one which is reset
every time a flush happens:
[https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L747-L749]
Other points of the protocol being inefficient may still be valid. Closing this
jira.
----
Original description (for reference only):
>From the way it looks currently {{BOS#WriteChunk}} always sends the entire
>client chunk buffer to the DataNode every single time it is called (e.g.
>during hsync). That is quite inefficient.
{{data}} here is the entire client key block chunk buffer, rather than just the
newly written (not flushed to DNs) part (which is the ideal way):
{code:java}
{code}
[https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951]
{code:java}
asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
blockID.get(), data, tokenString, replicationIndex, blockData, close);
{code}
It looks like the inefficient transfer is for (the ease of) DataNode-side
checksum verification for the most part:
[https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]
As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize
the chunk offset (offset stored in ChunkInfo):
[https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]
Need to double check to make sure it is indeed *not overwriting* the existing
chunk (which is fine but futile).
Goal: Get rid of such duplicate transfer.
Note: If we need a new protocol for this, it will likely be a new protocol
rather than the existing WriteChunkProto.
> [hsync] Improve WriteChunk to send incremental data only
> --------------------------------------------------------
>
> Key: HDDS-11595
> URL: https://issues.apache.org/jira/browse/HDDS-11595
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Siyao Meng
> Assignee: Siyao Meng
> Priority: Major
>
> Update:
> Upon closer inspection, {{chunk/data}} passed into
> {{BOS#writeChunkToContainer}} is in fact not the whole chunk buffer from the
> beginning (default 4 MB max), but indeed an incremental one which is reset
> every time a flush happens:
> [https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L747-L749]
> Other points of the protocol being inefficient may still be valid. Closing
> this jira.
> ----
> Original description (for reference only):
> From the way it looks currently {{BOS#WriteChunk}} always sends the entire
> client chunk buffer to the DataNode every single time it is called (e.g.
> during hsync). That is quite inefficient.
> {{data}} here is the entire client key block chunk buffer, rather than just
> the newly written (not flushed to DNs) part (which is the ideal way):
> {code:java}
> {code}
> [https://github.com/apache/ozone/blob/b23981cbdaf27134d9a622548977f10256e41d44/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java#L950-L951]
> {code:java}
> asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
> blockID.get(), data, tokenString, replicationIndex, blockData, close);
> {code}
> It looks like the inefficient transfer is for (the ease of) DataNode-side
> checksum verification for the most part:
> [https://github.com/apache/ozone/blob/f563d676dc6b6cb9e0ed5d288b94e7660a2584c1/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L891]
> As for actual chunk *write to disk* on a DataNode, it does *seem* to utilize
> the chunk offset (offset stored in ChunkInfo):
> [https://github.com/apache/ozone/blob/274da83cfe00b5bea89fd728f74007936183fbde/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L141-L142]
> Need to double check to make sure it is indeed *not overwriting* the existing
> chunk (which is fine but futile).
> Goal: Get rid of such duplicate transfer.
> Note: If we need a new protocol for this, it will likely be a new protocol
> rather than the existing WriteChunkProto.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]