[
https://issues.apache.org/jira/browse/HDDS-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18088396#comment-18088396
]
Ivan Andika commented on HDDS-12578:
------------------------------------
[~erose] Appreciate for the evaluation as well. Just putting my two cents out.
Yes, HDFS chain replication should be the best in terms rolling upgrade (since
there is no need to have extra state management) and other things such as the
number of possible write pipelines (which should be the permutation of all 3
DNs). However, the stateless pipelines are only allowed because HDFS data
granularity is a block, but Ozone is a Ozone container (a collection of blocks)
to solve the small files issue (which is a core tenet of Ozone). Therefore,
Ozone requires pipelines to be tracked to write to the set of 3 datanodes that
has the particular allocated container.
> Ozone on CRAQ
> -------------
>
> Key: HDDS-12578
> URL: https://issues.apache.org/jira/browse/HDDS-12578
> Project: Apache Ozone
> Issue Type: Wish
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
> Attachments: screenshot-1.png
>
>
> This is just a long-term wish to explore Chain Replication or CRAQ on Ozone.
> Currently Ozone supports Raft based write pipeline and EC. From the Data
> replication spectrum
> ([https://transactional.blog/blog/2024-data-replication-design-spectrum]),
> these two pipelines cover the Leader-based (Raft based write pipeline) and
> Quorum-based (EC) replication algorithm. CRAQ falls under
> Reconfiguration-based replication algorithms.
> We can consider supporting CRAQ pipelines on Ozone. As mentioned in
> discussion
> [https://github.com/apache/ozone/discussions/6870#discussioncomment-9907706],
> chained replication might be needed for rolling upgrade support. Although
> CRAQ promised higher bandwidth, higher read performance, and strong
> consistency, there are some drawbacks such as higher write latency (since all
> writes need to propagate to the tail), higher downtime during node failure
> (waiting for the control plane to reconfigure the chains), etc.
> The wish comes from the recent DeepSeek 3FS distributed file system that uses
> CRAQ as its main write pipeline
> ([https://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md]). Other
> system such as Meta's Delta
> ([https://engineering.fb.com/2022/05/04/data-infrastructure/delta/]) also
> uses CRAQ.
> Since it is a Reconfiguration-based replication algorithms, there might be a
> need to support ZooKeeper-like semantics on top of Ratis or Raft in SCM HA,
> similar to Clickhouse Keeper ([https://clickhouse.com/clickhouse/keeper]) or
> Meta's Zelos (https://engineering.fb.com/2022/06/08/developer-tools/zelos/)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]