This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 141f8b97f IMPALA-14492: Document delete orphan files for Iceberg table
141f8b97f is described below

commit 141f8b97ffa9d466df15cbfaa4706e267e27e5b4
Author: Riza Suminto <[email protected]>
AuthorDate: Mon Oct 13 12:14:42 2025 -0700

    IMPALA-14492: Document delete orphan files for Iceberg table
    
    This patch adds documentation for REMOVE_ORPHAN_FILES query added by
    IMPALA-12337.
    
    Change-Id: Ie8de6112bf9ccd879ea3e14d86e67b99e1087c0f
    Reviewed-on: http://gerrit.cloudera.org:8080/23532
    Reviewed-by: Michael Smith <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-by: Zoltan Borok-Nagy <[email protected]>
---
 docs/topics/impala_iceberg.xml | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 7756d43d3..c9d4689e4 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -811,6 +811,37 @@ ALTER TABLE ice_tbl EXECUTE expire_snapshots(now() - 
interval 5 days);
     </conbody>
   </concept>
 
+  <concept id="iceberg_remove_orphan_files">
+    <title>Removing orphan files</title>
+    <conbody>
+      <p>
+        Failures can leave files that are not referenced by table metadata. 
These are
+        called orphan files. And in some cases normal snapshot expiration may 
not be able
+        to determine a file is no longer needed and delete it. Impala can 
remove these
+        orphan files with
+        <codeph>ALTER TABLE ... EXECUTE remove_orphan_files(...)</codeph>
+        statement, which will remove all orphan files that has modification 
time older
+        than the specified timestamp. For example:
+        <codeblock>
+-- Remove orphan files older than '2022-01-04 10:00:00'.
+ALTER TABLE ice_tbl EXECUTE remove_orphan_files('2022-01-04 10:00:00');
+
+-- Remove orphan files older than 5 days from now.
+ALTER TABLE ice_tbl EXECUTE remove_orphan_files(now() - interval 5 days);
+        </codeblock>
+      </p>
+      <p>
+        Note that this is a destructive query that will wipe out any files 
within Iceberg
+        table's 'data' and 'metadata' directory that is not addressable by any 
valid
+        snapshots. It is dangerous to remove orphan files with a retention 
interval
+        shorter than the time expected for any write to complete because it 
might corrupt
+        the table if in-progress files are considered orphaned and are 
deleted. It is
+        recommended to set timestamp a day ago or older for this remove orphan 
files
+        query.
+      </p>
+    </conbody>
+  </concept>
+
   <concept id="iceberg_metadata_tables">
     <title>Iceberg metadata tables</title>
     <conbody>

Reply via email to