szetszwo commented on code in PR #1338:
URL: https://github.com/apache/ratis/pull/1338#discussion_r2733429957


##########
ratis-docs/src/site/markdown/concept/index-v2.md:
##########
@@ -0,0 +1,499 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Apache Ratis Concepts
+
+## Table of Contents
+
+1. [Overview of Raft and Apache Ratis](#overview-of-raft-and-apache-ratis)
+2. [Raft Cluster Topology](#raft-cluster-topology)
+3. [The Raft Log - Foundation of 
Consensus](#the-raft-log---foundation-of-consensus)
+4. [The State Machine - Your Application's 
Heart](#the-state-machine---your-applications-heart)
+5. [Consistency Models and Read 
Patterns](#consistency-models-and-read-patterns)
+6. [Snapshots - Managing Growth and 
Recovery](#snapshots---managing-growth-and-recovery)
+7. [Logical Organization of Ratis](#logical-organization-of-ratis)
+8. [Leadership and Fault Tolerance](#leadership-and-fault-tolerance)
+9. [Scaling with Multi-Raft Groups](#scaling-with-multi-raft-groups)
+
+## Overview of Raft and Apache Ratis
+
+The Raft consensus algorithm solves a fundamental problem in distributed 
systems: how do you get
+multiple computers to agree on a sequence of operations, even when some might 
fail or become
+unreachable? This problem, known as distributed consensus, is at the heart of 
building reliable
+distributed systems.
+
+Raft ensures that a cluster of servers maintains an identical, ordered log of 
operations. Each
+server applies these operations to its local state machine in the same order, 
guaranteeing that
+all servers end up with identical state. This approach, called state machine 
replication,
+provides both consistency and fault tolerance.
+
+You should consider using Raft when your system needs strong consistency 
guarantees across
+multiple servers. This typically applies to systems where correctness is more 
important than
+absolute performance, such as distributed databases, configuration management 
systems, or any
+application where split-brain scenarios would be unacceptable.
+
+Apache Ratis is a Java library that implements the Raft consensus protocol. 
The key word here
+is "library" - Ratis is not a standalone service that you communicate with 
over the network.
+Instead, you embed Ratis directly into your Java application, and it becomes 
part of your
+application's runtime.
+
+This embedded approach creates tight integration between your application and 
the consensus
+mechanism. Your application and Ratis run in the same JVM, sharing memory and 
computational
+resources. Your application provides the business logic (the "state machine" 
in Raft terminology),
+while Ratis handles the distributed consensus mechanics needed to keep 
multiple instances of your
+application synchronized.
+
+## Raft Cluster Topology
+
+Understanding the basic building blocks of a Raft deployment affects both the 
correctness and
+performance of your system.
+
+### Servers, Clusters, and Groups
+
+A Raft server (also known as a "peer") is a single running instance of your 
application with
+Ratis embedded. Each server runs your state machine and participates in the 
consensus protocol.
+
+A Raft cluster is a physical collection of servers that can participate in 
consensus. A Raft
+group is a logical consensus domain that runs across a specific subset of 
peers in the cluster.
+At any given time, one peer in a group acts as the "leader" while the others 
are "followers" or
+"listeners". The leader handles all write requests and replicates operations 
to other peers in
+the group. Both leaders and followers can service read requests, with 
different consistency
+guarantees.
+
+A single cluster can host multiple independent Raft groups, each with its own 
leader election,
+consistency and state replication. Groups typically consist of an odd number 
of peers (3, 5, or
+7 are common) to ensure clear majority decisions.
+
+### Majority-Based Decision-Making
+
+Raft's safety guarantees depend on majority agreement within each group. The 
leader replicates
+each operation to the followers in its group, and operations are committed 
when at least
+(N/2 + 1) peers in that group acknowledge them. This means a group of 3 peers 
can tolerate 1
+failure, a group of five peers can tolerate 2 failures, and so on.
+
+This majority requirement affects both availability and performance. A group 
remains available as
+long as a majority of its peers are reachable and functioning. However, every 
transaction must
+wait for majority acknowledgment, so the slowest server in the majority 
determines your write
+latency.
+
+### Server Placement and Network Considerations
+
+The physical and network placement of your servers impacts both availability 
and performance.
+Placing all servers in the same rack or data center provides the lowest 
latency but risks
+creating a single point of failure. Distributing servers across multiple 
availability zones or
+data centers improves fault tolerance but can increase latency.
+
+A common approach is to place servers across multiple availability zones 
within a single region
+for a balance of fault tolerance and performance. For applications requiring 
geographic
+distribution, you might place servers in different regions, accepting higher 
latency in exchange
+for better disaster recovery capabilities.
+
+## The Raft Log - Foundation of Consensus
+
+The Raft log is the central data structure that makes distributed consensus 
possible. Each server
+in a Raft group maintains its own copy of this append-only ledger that records 
every operation
+in the exact order they should be applied to the state machine.
+
+Each entry in the log contains three key pieces of information: the operation 
itself (what should
+be done), a log index (a sequential number indicating the entry's position), 
and a term number
+(the period during which a leader created this entry). Terms represent periods 
of leadership and
+increase each time a new leader is elected, preventing old leaders from 
overwriting newer entries.
+The combination of the term and log index is referred to as a term-index 
(`TermIndex`) and 
+establishes the ordering of entries in the log.
+
+The log serves as both the mechanism for replication (leaders send log entries 
to followers) and
+the source of truth for recovery (servers can rebuild their state by replaying 
the log). When we
+talk about "committing" an operation, we mean that a majority of servers have 
acknowledged
+storing that log entry, making it safe to apply to the state machine.
+
+## The State Machine - Your Application's Heart
+
+In Ratis, the state machine is your application's primary integration point. 
Your business logic
+or data storage operations are implemented by the state machine.
+
+The state machine is not a finite state machine with states and transitions. 
Instead, it's a
+deterministic computation engine that processes a sequence of operations and 
maintains some
+internal state. The state machine must be deterministic: given the same 
sequence of operations,
+it must always produce the same results and end up in the same final state. 
Operations are
+processed sequentially, one at a time, in the order they appear in the Raft 
log.
+
+### State Machine Responsibilities
+
+Your state machine has three primary responsibilities. First, it processes 
Raft transactions by
+validating incoming requests before they're replicated and applying committed 
operations to your
+application state. Second, it maintains your application's actual data, which 
might be an
+in-memory data structure, a local database, files on disk, or any combination 
of these. Third,
+it creates point-in-time representations of its state (snapshots) and can 
restore its state from
+snapshots during recovery.
+
+### The State Machine Lifecycle
+
+The state machine operates at two different lifecycle levels: an overall peer 
lifecycle and a
+per-transaction processing lifecycle.
+
+#### Peer Lifecycle
+
+During initialization, when a peer starts up, the state machine loads any 
existing snapshots and
+prepares its internal data structures. The Raft layer then replays any log 
entries that occurred
+after the snapshot, bringing the peer up to the current state of the group.
+
+During normal operation, the state machine continuously processes transactions 
as they're
+committed by the Raft group, responds to leadership changes, and handles 
read-only queries. For

Review Comment:
   > ... the state machine implements LeaderEventApi (or FollowerEventApi) ...
   
   Yes, you are right that a state machine can implement those APIs.  I mean 
that supporting those APIs is optional.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to