This is an automated email from the ASF dual-hosted git repository.
jsancio pushed a commit to branch 3.3
in repository https://gitbox.apache.org/repos/asf/kafka.git
The following commit(s) were added to refs/heads/3.3 by this push:
new 96869af7af KAFKA-14205; Document how to replace the disk for the KRaft
Controller (#12597)
96869af7af is described below
commit 96869af7af952b24db2aa9347d867153703a8ae5
Author: José Armando García Sancio <[email protected]>
AuthorDate: Mon Sep 12 16:57:54 2022 -0700
KAFKA-14205; Document how to replace the disk for the KRaft Controller
(#12597)
Document process for recovering and formatting the metadata log directory
for the KRaft controller.
Reviewers: Colin Patrick McCabe <[email protected]>, Jason Gustafson
<[email protected]>
---
docs/ops.html | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/docs/ops.html b/docs/ops.html
index 1854cf057c..da13ad9b44 100644
--- a/docs/ops.html
+++ b/docs/ops.html
@@ -1373,6 +1373,27 @@ $ bin/kafka-acls.sh \
<li>delalloc: Delayed allocation means that the filesystem avoid
allocating any blocks until the physical write occurs. This allows ext4 to
allocate a large extent instead of smaller pages and helps ensure the data is
written sequentially. This feature is great for throughput. It does seem to
involve some locking in the filesystem which adds a bit of latency variance.
</ul>
+ <h4 class="anchor-heading"><a id="replace_disk" class="anchor-link"></a><a
href="#replace_disk">Replace KRaft Controller Disk</a></h4>
+ <p>When Kafka is configured to use KRaft, the controllers store the cluster
metadata in the directory specified in <code>metadata.log.dir</code> -- or the
first log directory, if <code>metadata.log.dir</code> is not configured. See
the documentation for <code>metadata.log.dir</code> for details.</p>
+
+ <p>If the data in the cluster metdata directory is lost either because of
hardware failure or the hardware needs to be replaced, care should be taken
when provisioning the new controller node. The new controller node should not
be formatted and started until the majority of the controllers have all of the
committed data. To determine if the majority of the controllers have the
committed data, run the <code>kafka-metadata-quorum.sh</code> tool to describe
the replication status:</p>
+
+ <pre class="line-numbers"><code class="language-bash"> >
bin/kafka-metadata-quorum.sh --bootstrap-server broker_host:port describe
--replication
+ NodeId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp
Status
+ 1 25806 0 1662500992757 1662500992757
Leader
+ ... ... ... ... ...
...
+ </code></pre>
+
+ <p>Check and wait until the <code>Lag</code> is small for a majority of the
controllers. If the leader's end offset is not increasing, you can wait until
the lag is 0 for a majority; otherwise, you can pick the latest leader end
offset and wait until all replicas have reached it. Check and wait until the
<code>LastFetchTimestamp</code> and <code>LastCaughtUpTimestamp</code> are
close to each other for the majority of the controllers. At this point it is
safer to format the controller's [...]
+
+ <pre class="line-numbers"><code class="language-bash"> >
bin/kafka-storage.sh format --cluster-id uuid --config
server_properties</code></pre>
+
+ <p>It is possible for the <code>bin/kafka-storage.sh format<code> command
above to fail with a message like <code>Log directory ... is already
formatted<code>. This can happend when combined mode is used and only the
metadata log directory was lost but not the others. In that case and only in
that case, can you run the <code>kafka-storage.sh format</code> command with
the <code>--ignore-formatted</code> option.</p>
+
+ <p>Start the KRaft controller after formatting the log directories.</p>
+
+ <pre class="line-numbers"><code class="language-bash"> >
/bin/kafka-server-start.sh server_properties</code></pre>
+
<h3 class="anchor-heading"><a id="monitoring" class="anchor-link"></a><a
href="#monitoring">6.8 Monitoring</a></h3>
Kafka uses Yammer Metrics for metrics reporting in the server. The Java
clients use Kafka Metrics, a built-in metrics registry that minimizes
transitive dependencies pulled into client applications. Both expose metrics
via JMX and can be configured to report stats using pluggable stats reporters
to hook up to your monitoring system.