This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit fa64be7cc7074f201fff1eccc9cbf19520a19c55
Author: Riza Suminto <[email protected]>
AuthorDate: Thu Feb 23 16:05:31 2023 -0800

    IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
    
    IMPALA-11658 implements Iceberg manifest caching for Impala. This patch
    adds documentation for configuring the cache(s).
    
    Testing:
    - Built docs locally
    
    Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe
    Reviewed-on: http://gerrit.cloudera.org:8080/19530
    Reviewed-by: Daniel Becker <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-by: Zoltan Borok-Nagy <[email protected]>
---
 docs/topics/impala_iceberg.xml | 60 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 32363f3de..62abca615 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -606,4 +606,64 @@ ALTER TABLE ice_tbl EXECUTE expire_snapshots(now() - 
interval 5 days);
       </p>
     </conbody>
   </concept>
+
+  <concept id="iceberg_manifest_caching">
+    <title>Iceberg manifest caching</title>
+    <conbody>
+      <p>
+        Starting from version 1.1.0, Apache Iceberg provides a mechanism to 
cache the
+        contents of Iceberg manifest files in memory. This manifest caching 
feature helps
+        to reduce repeated reads of small Iceberg manifest files from remote 
storage by
+        Coordinators and Catalogd. This feature can be enabled for Impala 
Coordinators and
+        Catalogd by setting properties in Hadoop's core-site.xml as in the 
following:
+        <codeblock>
+iceberg.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
+iceberg.io.manifest.cache-enabled=true;
+iceberg.io.manifest.cache.max-total-bytes=104857600;
+iceberg.io.manifest.cache.expiration-interval-ms=3600000;
+iceberg.io.manifest.cache.max-content-length=8388608;
+        </codeblock>
+      </p>
+      <p>
+        The description of each property is as follows:
+        <ul>
+          <li>
+            <codeph>iceberg.io-impl</codeph>: custom FileIO implementation to 
use in a
+            catalog. Must be set to enable manifest caching. Impala defaults to
+            HadoopFileIO. It is recommended to not change this to other than 
HadoopFileIO.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache-enabled</codeph>: enable/disable 
the
+            manifest caching feature.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>: 
maximum total
+            amount of bytes to cache in the manifest cache. Must be a positive 
value.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache.expiration-interval-ms</codeph>: 
maximum
+            duration for which an entry stays in the manifest cache. Must be a
+            non-negative value. Setting zero means cache entries expire only 
if it gets
+            evicted due to memory pressure from
+            <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache.max-content-length</codeph>: 
maximum length
+            of a manifest file to be considered for caching in bytes. Manifest 
files with
+            a length exceeding this property value will not be cached. Must be 
set with a
+            positive value and lower than
+            <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+          </li>
+        </ul>
+      </p>
+      <p>
+        Manifest caching only works for tables that are loaded with either of
+        HadoopCatalogs or HiveCatalogs. Individual HadoopCatalog and 
HiveCatalog will have
+        separate manifest caches with the same configuration. By default, only 
8 catalogs
+        can have their manifest cache active in memory. This number can be 
raised by
+        setting a higher value in the java system property
+        <codeph>iceberg.io.manifest.cache.fileio-max</codeph>.
+      </p>
+    </conbody>
+  </concept>
 </concept>

Reply via email to