tinaselenge commented on code in PR #14382: URL: https://github.com/apache/kafka/pull/14382#discussion_r1325541475
########## docs/ops.html: ########## @@ -3859,6 +3859,98 @@ <h3>Finalizing the migration</h3> # Other configs ...</pre> + +<h3 class="anchor-heading"><a id="tiered_storage" class="anchor-link"></a><a href="#kraft">6.11 Tiered Storage</a></h3> + +<h4 class="anchor-heading"><a id="tiered_storage_overview" class="anchor-link"></a><a href="#tiered_storage_overview">Tiered Storage Overview</a></h4> + +<p>Kafka data is mostly consumed in a streaming fashion using tail reads. Tail reads leverage OS's page cache to serve the data instead of disk reads. + Older data is typically read from the disk for backfill or failure recovery purposes and is infrequent.</p> + +<p>In the tiered storage approach, Kafka cluster is configured with two tiers of storage - local and remote. + The local tier is the same as the current Kafka that uses the local disks on the Kafka brokers to store the log segments. + The new remote tier uses external storage systems, such as HDFS or S3, to store the completed log segments. + Please check <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage">KIP-405</a> for more information. +</p> + +<p><b>Note: Tiered storage is considered as an early access feature, and is not recommended for use in production environments</b></p> + +<h4 class="anchor-heading"><a id="tiered_storage_config" class="anchor-link"></a><a href="#tiered_storage_config">Configuration</a></h4> + +<h5 class="anchor-heading"><a id="tiered_storage_config_broker" class="anchor-link"></a><a href="#tiered_storage_config_broker">Broker Configurations</a></h5> + +<p>By default, Kafka server will not enable tiered storage feature. <code>remote.log.storage.system.enable</code> + is the property to control whether to enable tiered storage functionality in a broker or not. Setting it to "true" to enable this feature. +</p> + +<p><code>RemoteStorageManager</code> is an interface to provide the lifecycle of remote log segments and indexes. Kafka server + doesn't provide out-of-the-box implementation of RemoteStorageManager. Configuring <code>remote.log.storage.manager.class.name</code> + and <code>remote.log.storage.manager.class.path</code> to specify the implementation of RemoteStorageManager. +</p> + +<p><code>RemoteLogMetadataManager</code> is an interface to provide the lifecycle of metadata about remote log segments with strongly consistent semantics. + By default, Kafka provides an implementation with storage as an internal topic. This implementation can be changed by configuring + <code>remote.log.metadata.manager.class.name</code> and <code>remote.log.metadata.manager.class.path</code>. + When adopting the default kafka internal topic based implementation, <code>remote.log.metadata.manager.listener.name</code> + is a mandatory property to specify which listener the clients created by the default RemoteLogMetadataManager implementation. +</p> + + +<h5 class="anchor-heading"><a id="tiered_storage_config_topic" class="anchor-link"></a><a href="#tiered_storage_config_topic">Topic Configurations</a></h5> + +<p>After correctly configuring broker side configurations for tiered storage feature, there are still configurations in topic level needed to be set. + <code>remote.storage.enable</code> is the switch to determine if this topic want to use tiered storage or not. By default it is set as false. + After enabling <code>remote.storage.enable</code> property, the next thing to consider is the log retention. + When tiered storage is enabled in the topic, there will be 2 additional log retention configuration to set: Review Comment: ```suggestion When tiered storage is enabled for a topic, there are 2 additional log retention configurations to set: ``` ########## docs/ops.html: ########## @@ -3859,6 +3859,98 @@ <h3>Finalizing the migration</h3> # Other configs ...</pre> + +<h3 class="anchor-heading"><a id="tiered_storage" class="anchor-link"></a><a href="#kraft">6.11 Tiered Storage</a></h3> + +<h4 class="anchor-heading"><a id="tiered_storage_overview" class="anchor-link"></a><a href="#tiered_storage_overview">Tiered Storage Overview</a></h4> + +<p>Kafka data is mostly consumed in a streaming fashion using tail reads. Tail reads leverage OS's page cache to serve the data instead of disk reads. + Older data is typically read from the disk for backfill or failure recovery purposes and is infrequent.</p> + +<p>In the tiered storage approach, Kafka cluster is configured with two tiers of storage - local and remote. + The local tier is the same as the current Kafka that uses the local disks on the Kafka brokers to store the log segments. + The new remote tier uses external storage systems, such as HDFS or S3, to store the completed log segments. + Please check <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage">KIP-405</a> for more information. +</p> + +<p><b>Note: Tiered storage is considered as an early access feature, and is not recommended for use in production environments</b></p> + +<h4 class="anchor-heading"><a id="tiered_storage_config" class="anchor-link"></a><a href="#tiered_storage_config">Configuration</a></h4> + +<h5 class="anchor-heading"><a id="tiered_storage_config_broker" class="anchor-link"></a><a href="#tiered_storage_config_broker">Broker Configurations</a></h5> + +<p>By default, Kafka server will not enable tiered storage feature. <code>remote.log.storage.system.enable</code> + is the property to control whether to enable tiered storage functionality in a broker or not. Setting it to "true" to enable this feature. +</p> + +<p><code>RemoteStorageManager</code> is an interface to provide the lifecycle of remote log segments and indexes. Kafka server + doesn't provide out-of-the-box implementation of RemoteStorageManager. Configuring <code>remote.log.storage.manager.class.name</code> + and <code>remote.log.storage.manager.class.path</code> to specify the implementation of RemoteStorageManager. +</p> + +<p><code>RemoteLogMetadataManager</code> is an interface to provide the lifecycle of metadata about remote log segments with strongly consistent semantics. + By default, Kafka provides an implementation with storage as an internal topic. This implementation can be changed by configuring + <code>remote.log.metadata.manager.class.name</code> and <code>remote.log.metadata.manager.class.path</code>. + When adopting the default kafka internal topic based implementation, <code>remote.log.metadata.manager.listener.name</code> + is a mandatory property to specify which listener the clients created by the default RemoteLogMetadataManager implementation. +</p> + + +<h5 class="anchor-heading"><a id="tiered_storage_config_topic" class="anchor-link"></a><a href="#tiered_storage_config_topic">Topic Configurations</a></h5> + +<p>After correctly configuring broker side configurations for tiered storage feature, there are still configurations in topic level needed to be set. + <code>remote.storage.enable</code> is the switch to determine if this topic want to use tiered storage or not. By default it is set as false. Review Comment: ```suggestion <code>remote.storage.enable</code> is the switch to determine if a topic wants to use tiered storage or not. By default it is set to false. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
