This is an automated email from the ASF dual-hosted git repository.

jsancio pushed a commit to branch 3.3
in repository https://gitbox.apache.org/repos/asf/kafka.git


The following commit(s) were added to refs/heads/3.3 by this push:
     new 96869af7af KAFKA-14205; Document how to replace the disk for the KRaft 
Controller (#12597)
96869af7af is described below

commit 96869af7af952b24db2aa9347d867153703a8ae5
Author: José Armando García Sancio <[email protected]>
AuthorDate: Mon Sep 12 16:57:54 2022 -0700

    KAFKA-14205; Document how to replace the disk for the KRaft Controller 
(#12597)
    
    Document process for recovering and formatting the metadata log directory 
for the KRaft controller.
    
    Reviewers: Colin Patrick McCabe <[email protected]>, Jason Gustafson 
<[email protected]>
---
 docs/ops.html | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/docs/ops.html b/docs/ops.html
index 1854cf057c..da13ad9b44 100644
--- a/docs/ops.html
+++ b/docs/ops.html
@@ -1373,6 +1373,27 @@ $ bin/kafka-acls.sh \
     <li>delalloc: Delayed allocation means that the filesystem avoid 
allocating any blocks until the physical write occurs. This allows ext4 to 
allocate a large extent instead of smaller pages and helps ensure the data is 
written sequentially. This feature is great for throughput. It does seem to 
involve some locking in the filesystem which adds a bit of latency variance.
   </ul>
 
+  <h4 class="anchor-heading"><a id="replace_disk" class="anchor-link"></a><a 
href="#replace_disk">Replace KRaft Controller Disk</a></h4>
+  <p>When Kafka is configured to use KRaft, the controllers store the cluster 
metadata in the directory specified in <code>metadata.log.dir</code> -- or the 
first log directory, if <code>metadata.log.dir</code> is not configured. See 
the documentation for <code>metadata.log.dir</code> for details.</p>
+
+  <p>If the data in the cluster metdata directory is lost either because of 
hardware failure or the hardware needs to be replaced, care should be taken 
when provisioning the new controller node. The new controller node should not 
be formatted and started until the majority of the controllers have all of the 
committed data. To determine if the majority of the controllers have the 
committed data, run the <code>kafka-metadata-quorum.sh</code> tool to describe 
the replication status:</p>
+
+  <pre class="line-numbers"><code class="language-bash"> &gt; 
bin/kafka-metadata-quorum.sh --bootstrap-server broker_host:port describe 
--replication
+ NodeId  LogEndOffset    Lag     LastFetchTimestamp      LastCaughtUpTimestamp 
  Status
+ 1       25806           0       1662500992757           1662500992757         
  Leader
+ ...     ...             ...     ...                     ...                   
  ...
+  </code></pre>
+
+  <p>Check and wait until the <code>Lag</code> is small for a majority of the 
controllers. If the leader's end offset is not increasing, you can wait until 
the lag is 0 for a majority; otherwise, you can pick the latest leader end 
offset and wait until all replicas have reached it. Check and wait until the 
<code>LastFetchTimestamp</code> and <code>LastCaughtUpTimestamp</code> are 
close to each other for the majority of the controllers. At this point it is 
safer to format the controller's [...]
+
+  <pre class="line-numbers"><code class="language-bash"> &gt; 
bin/kafka-storage.sh format --cluster-id uuid --config 
server_properties</code></pre>
+
+  <p>It is possible for the <code>bin/kafka-storage.sh format<code> command 
above to fail with a message like <code>Log directory ... is already 
formatted<code>. This can happend when combined mode is used and only the 
metadata log directory was lost but not the others. In that case and only in 
that case, can you run the <code>kafka-storage.sh format</code> command with 
the <code>--ignore-formatted</code> option.</p>
+
+  <p>Start the KRaft controller after formatting the log directories.</p>
+
+  <pre class="line-numbers"><code class="language-bash"> &gt; 
/bin/kafka-server-start.sh server_properties</code></pre>
+
   <h3 class="anchor-heading"><a id="monitoring" class="anchor-link"></a><a 
href="#monitoring">6.8 Monitoring</a></h3>
 
   Kafka uses Yammer Metrics for metrics reporting in the server. The Java 
clients use Kafka Metrics, a built-in metrics registry that minimizes 
transitive dependencies pulled into client applications. Both expose metrics 
via JMX and can be configured to report stats using pluggable stats reporters 
to hook up to your monitoring system.

Reply via email to