[ 
https://issues.apache.org/jira/browse/HDDS-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947237#comment-17947237
 ] 

Ivan Andika commented on HDDS-12578:
------------------------------------

> Do you mean Ratis Streaming?

While Ratis streaming has some similarities (e.g. supports pipeline topology) 
with CRAQ, there are some major differences.  CRAQ requires a coordinator that 
will decide which node is the chain head and tail since it requires a chain to 
have a fixed head node and tail node, whereas in Ratis streaming any datanode 
can be the "primary" and chain topology is sent by the client by RoutingTable 
object. Therefore, Ratis streaming still need to guarantee its strong 
consistency using Raft consensus protocol during the write commit phase 
(sendForward), while CRAQ algorithm in itself is consistent. The downside is 
that CRAQ pipeline cannot support lower replication like majority quorum 
(unlike Raft), since it needs the writes to be propagated from the head to the 
tail, before it's returned to the write. Additionally, it requires a central 
coordinator to store the Chain table (similar to SCM pipeline table, but with 
more info regarding the chain topology).



> Ozone on CRAQ
> -------------
>
>                 Key: HDDS-12578
>                 URL: https://issues.apache.org/jira/browse/HDDS-12578
>             Project: Apache Ozone
>          Issue Type: Wish
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> This is just a long-term wish to explore Chain Replication or CRAQ on Ozone.
> Currently Ozone supports Raft based write pipeline and EC. From the Data 
> replication spectrum 
> ([https://transactional.blog/blog/2024-data-replication-design-spectrum]), 
> these two pipelines cover the Leader-based (Raft based write pipeline) and 
> Quorum-based (EC) replication algorithm. CRAQ falls under 
> Reconfiguration-based replication algorithms. 
> We can consider supporting CRAQ pipelines on Ozone. As mentioned in 
> discussion 
> [https://github.com/apache/ozone/discussions/6870#discussioncomment-9907706], 
> chained replication might be needed for rolling upgrade support. Although 
> CRAQ promised higher bandwidth, higher read performance, and strong 
> consistency, there are some drawbacks such as higher write latency (since all 
> writes need to propagate to the tail), higher downtime during node failure 
> (waiting for the control plane to reconfigure the chains), etc.
> The wish comes from the recent DeepSeek 3FS distributed file system that uses 
> CRAQ as its main write pipeline 
> ([https://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md]). Other 
> system such as Meta's Delta 
> ([https://engineering.fb.com/2022/05/04/data-infrastructure/delta/]) also 
> uses CRAQ.
> Since it is a Reconfiguration-based replication algorithms, there might be a 
> need to support ZooKeeper-like semantics on top of Ratis or Raft in SCM HA, 
> similar to Clickhouse Keeper ([https://clickhouse.com/clickhouse/keeper]) or 
> Meta's Zelos (https://engineering.fb.com/2022/06/08/developer-tools/zelos/)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to