[ 
https://issues.apache.org/jira/browse/HDDS-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-14223:
-------------------------------
    Description: 
Streaming Write Pipeline Approach 2 was written in the initial Ozone Streaming 
Write Pipeline design doc 
(https://issues.apache.org/jira/secure/attachment/13015193/Ozone%20Write%20Streaming.pdf)
 to improve the performance compared to phase 1.

The idea is that instead of sending PutBlock through the AsyncApi#send (which 
will send to the leader instead of the primary), we will send PutBlock 
(metadata) as part of the streaming (i.e. DataStreamOutput#writeAsync), shared 
with WriteChunk (data) writeAsync calls. We can even force a SYNC as part of 
the PutBlock

We have two patches HDDS-6500 and HDDS-6137 that added support for sending 
PutBlock during stream close. However, there are some issues in the 
implementation as mentioned in HDDS-12007). 

Furthermore, we are still mixing Approach 1 (Separate Metadata and Data) which 
is called during flush boundary + close with Approach 2 which is called only 
during close. In my opinion, we should stick with one approach.

There are a few things to revisit
* We might need to introduce a new Ratis internal mechanism to handle a write 
"commit", or can we reuse StandardWriteOption.SYNC
* We need to see whether we actually need to use Raft at all or can Ratis 
streaming be used to implement a general write pipeline (i.e. Chain 
replication, CRAQ, HDFS write pipeline), etc





  was:
Streaming Write Pipeline Approach 2 was written in the initial Ozone Streaming 
Write Pipeline design doc 
(https://issues.apache.org/jira/secure/attachment/13015193/Ozone%20Write%20Streaming.pdf)
 to improve the performance compared to phase 1.

The idea is that instead of sending PutBlock through the AsyncApi#send (which 
will send to the leader instead of the primary), we will send PutBlock 
(metadata) as part of the streaming (i.e. DataStreamOutput#writeAsync), shared 
with WriteChunk (data) writeAsync calls.

We have two patches HDDS-6500 and HDDS-6137 that added support for sending 
PutBlock during stream close. However, there are some issues in the 
implementation as mentioned in HDDS-12007). 

Furthermore, we are still mixing Approach 1 (Separate Metadata and Data) which 
is called during flush boundary + close with Approach 2 which is called only 
during close. In my opinion, we should stick with one approach.

There are a few things to revisit
* We might need to introduce a new Ratis internal mechanism to handle a write 
"commit"
* We need to see whether we actually need to use Raft at all or can Ratis 
streaming be used to implement a general write pipeline (i.e. Chain 
replication, CRAQ, HDFS write pipeline), etc






> Revisit Approach 2: Streaming Path for both Data and MetaData 
> --------------------------------------------------------------
>
>                 Key: HDDS-14223
>                 URL: https://issues.apache.org/jira/browse/HDDS-14223
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> Streaming Write Pipeline Approach 2 was written in the initial Ozone 
> Streaming Write Pipeline design doc 
> (https://issues.apache.org/jira/secure/attachment/13015193/Ozone%20Write%20Streaming.pdf)
>  to improve the performance compared to phase 1.
> The idea is that instead of sending PutBlock through the AsyncApi#send (which 
> will send to the leader instead of the primary), we will send PutBlock 
> (metadata) as part of the streaming (i.e. DataStreamOutput#writeAsync), 
> shared with WriteChunk (data) writeAsync calls. We can even force a SYNC as 
> part of the PutBlock
> We have two patches HDDS-6500 and HDDS-6137 that added support for sending 
> PutBlock during stream close. However, there are some issues in the 
> implementation as mentioned in HDDS-12007). 
> Furthermore, we are still mixing Approach 1 (Separate Metadata and Data) 
> which is called during flush boundary + close with Approach 2 which is called 
> only during close. In my opinion, we should stick with one approach.
> There are a few things to revisit
> * We might need to introduce a new Ratis internal mechanism to handle a write 
> "commit", or can we reuse StandardWriteOption.SYNC
> * We need to see whether we actually need to use Raft at all or can Ratis 
> streaming be used to implement a general write pipeline (i.e. Chain 
> replication, CRAQ, HDFS write pipeline), etc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to