[GitHub] [kafka] tinaselenge commented on a diff in pull request #14382: KAFKA-15442: add a section in doc for tiered storage

via GitHub Thu, 14 Sep 2023 01:02:02 -0700


tinaselenge commented on code in PR #14382:
URL: https://github.com/apache/kafka/pull/14382#discussion_r1325541025



##########
docs/ops.html:
##########
@@ -3859,6 +3859,98 @@ <h3>Finalizing the migration</h3>
 
 # Other configs ...</pre>
 
+
+<h3 class="anchor-heading"><a id="tiered_storage" class="anchor-link"></a><a 
href="#kraft">6.11 Tiered Storage</a></h3>
+
+<h4 class="anchor-heading"><a id="tiered_storage_overview" 
class="anchor-link"></a><a href="#tiered_storage_overview">Tiered Storage 
Overview</a></h4>
+
+<p>Kafka data is mostly consumed in a streaming fashion using tail reads. Tail 
reads leverage OS's page cache to serve the data instead of disk reads.
+  Older data is typically read from the disk for backfill or failure recovery 
purposes and is infrequent.</p>
+
+<p>In the tiered storage approach, Kafka cluster is configured with two tiers 
of storage - local and remote.
+  The local tier is the same as the current Kafka that uses the local disks on 
the Kafka brokers to store the log segments.
+  The new remote tier uses external storage systems, such as HDFS or S3, to 
store the completed log segments.
+  Please check <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage";>KIP-405</a>
 for more information.
+</p>
+
+<p><b>Note: Tiered storage is considered as an early access feature, and is 
not recommended for use in production environments</b></p>
+
+<h4 class="anchor-heading"><a id="tiered_storage_config" 
class="anchor-link"></a><a href="#tiered_storage_config">Configuration</a></h4>
+
+<h5 class="anchor-heading"><a id="tiered_storage_config_broker" 
class="anchor-link"></a><a href="#tiered_storage_config_broker">Broker 
Configurations</a></h5>
+
+<p>By default, Kafka server will not enable tiered storage feature. 
<code>remote.log.storage.system.enable</code>
+  is the property to control whether to enable tiered storage functionality in 
a broker or not. Setting it to "true" to enable this feature.
+</p>
+
+<p><code>RemoteStorageManager</code> is an interface to provide the lifecycle 
of remote log segments and indexes. Kafka server
+  doesn't provide out-of-the-box implementation of RemoteStorageManager. 
Configuring <code>remote.log.storage.manager.class.name</code>
+  and <code>remote.log.storage.manager.class.path</code> to specify the 
implementation of RemoteStorageManager.
+</p>
+
+<p><code>RemoteLogMetadataManager</code> is an interface to provide the 
lifecycle of metadata about remote log segments with strongly consistent 
semantics.
+  By default, Kafka provides an implementation with storage as an internal 
topic. This implementation can be changed by configuring
+  <code>remote.log.metadata.manager.class.name</code> and 
<code>remote.log.metadata.manager.class.path</code>.
+  When adopting the default kafka internal topic based implementation, 
<code>remote.log.metadata.manager.listener.name</code>
+  is a mandatory property to specify which listener the clients created by the 
default RemoteLogMetadataManager implementation.
+</p>
+
+
+<h5 class="anchor-heading"><a id="tiered_storage_config_topic" 
class="anchor-link"></a><a href="#tiered_storage_config_topic">Topic 
Configurations</a></h5>
+
+<p>After correctly configuring broker side configurations for tiered storage 
feature, there are still configurations in topic level needed to be set.
+  <code>remote.storage.enable</code> is the switch to determine if this topic 
want to use tiered storage or not. By default it is set as false.
+  After enabling <code>remote.storage.enable</code> property, the next thing 
to consider is the log retention.
+  When tiered storage is enabled in the topic, there will be 2 additional log 
retention configuration to set:
+
+<ul>
+  <li><code>local.retention.ms</code></li>
+  <li><code>retention.ms</code></li>
+  <li><code>local.retention.bytes</code></li>
+  <li><code>retention.bytes</code></li>
+</ul>
+
+  The configuration prefixed with <code>local</code> are to specify the 
time/size the "local" log file can accept before moving to remote storage, and 
then get deleted.
+  If unset, The value in <code>retention.ms</code> and 
<code>retention.bytes</code> will be used.
+</p>
+
+<h4 class="anchor-heading"><a id="tiered_storage_config_ex" 
class="anchor-link"></a><a href="#tiered_storage_config_ex">Configurations 
Example</a></h4>
+
+<p>Here is a sample configuration to enable tiered storage feature in broker 
side:
+<pre>
+# Sample Zookeeper/Kraft broker server.properties listening on 
PLAINTEXT://:9092
+remote.log.storage.system.enable=true
+# Please provide the implementation for remoteStorageManager. This is the 
mandatory configuration for tiered storage.
+# 
remote.log.storage.manager.class.name=org.apache.kafka.server.log.remote.storage.NoOpRemoteStorageManager
+# Using the "PLAINTEXT" listener for the clients in RemoteLogMetadataManager 
to talk to the brokers.
+remote.log.metadata.manager.listener.name=PLAINTEXT
+</pre>
+</p>
+
+<p>After broker is started, creating a topic with tiered storage enabled, and 
a small log time retention value to try this feature:
+<pre>bin/kafka-topics.sh --create --topic tieredTopic --bootstrap-server 
localhost:9092 --config remote.storage.enable=true --config 
local.retention.ms=1000
+</pre>
+</p>
+
+<p>Then, after the active segment is rolled, the old segment should be moved 
to the remote storage and get deleted.
+</p>
+
+<h4 class="anchor-heading"><a id="tiered_storage_limitation" 
class="anchor-link"></a><a 
href="#tiered_storage_limitation">Limitations</a></h4>
+
+<p>While the early access release of Tiered Storage offers the opportunity to 
try out this new feature, it is important to be aware of the following 
limitations:
+<ul>
+  <li>No support for clusters with multiple log directories (i.e. JBOD 
feature)</li>
+  <li>No support for compacted topics</li>
+  <li>Cannot disable tiered storage at the topic level</li>

Review Comment:
   Does this mean, once enabled for a topic, you cannot disable it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [kafka] tinaselenge commented on a diff in pull request #14382: KAFKA-15442: add a section in doc for tiered storage

Reply via email to