This is an automated email from the ASF dual-hosted git repository.
szehon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/master by this push:
new 8c2a924a8d Docs: RewritePositionDeleteFiles procedure (#7589)
8c2a924a8d is described below
commit 8c2a924a8d712a273205c9b4e38afcd0f36717ec
Author: Szehon Ho <[email protected]>
AuthorDate: Sat May 20 22:08:17 2023 -0700
Docs: RewritePositionDeleteFiles procedure (#7589)
---
docs/spark-procedures.md | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/docs/spark-procedures.md b/docs/spark-procedures.md
index b09eb9f097..b1275adde7 100644
--- a/docs/spark-procedures.md
+++ b/docs/spark-procedures.md
@@ -365,6 +365,51 @@ Rewrite the manifests in table `db.sample` and disable the
use of Spark caching.
CALL catalog_name.system.rewrite_manifests('db.sample', false)
```
+### `rewrite_position_delete_files`
+
+Iceberg can rewrite position delete files, which serves two purposes:
+* Minor Compaction: Compact small position delete files into larger ones.
This reduces the size of metadata stored in manifest files and overhead of
opening small delete files.
+* Remove Dangling Deletes: Filter out position delete records that refer to
data files that are no longer live. After rewrite_data_files, position delete
records pointing to the rewritten data files are not always marked for removal,
and can remain tracked by the table's live snapshot metadata. This is known as
the 'dangling delete' problem.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|----------------------------------|
+| `table` | ✔️ | string | Name of the table to update |
+| `options` | ️ | map<string, string> | Options to be used for procedure
|
+
+See the [`SizeBasedFileRewriter` Javadoc](../../../javadoc/{{% icebergVersion
%}}/org/apache/iceberg/actions/SizeBasedFileRewriter.html#field.summary),
+for list of all the supported options for this procedure.
+
+Dangling deletes are always filtered out during rewriting.
+
+#### Output
+
+| Output Name | Type | Description
|
+|--------------------------------|------|----------------------------------------------------------------------------|
+| `rewritten_delete_files_count` | int | Number of delete files which were
removed by this command |
+| `added_delete_files_count` | int | Number of delete files which were
added by this command |
+| `rewritten_bytes_count` | long | Count of bytes across delete files
which were removed by this command |
+| `added_bytes_count` | long | Count of bytes across all new delete
files which were added by this command |
+
+
+#### Examples
+
+Rewrite position delete files in table `db.sample`. This selects position
delete files that fit default rewrite criteria, and writes new files of target
size `target-file-size-bytes`. Dangling deletes are removed from rewritten
delete files.
+```sql
+CALL catalog_name.system.rewrite_position_delete_files('db.sample')
+```
+
+Rewrite all position delete files in table `db.sample`, writing new files
`target-file-size-bytes`. Dangling deletes are removed from rewritten delete
files.
+```sql
+CALL catalog_name.system.rewrite_position_delete_files(table => 'db.sample',
options => map('rewrite-all', 'true'))
+```
+
+Rewrite position delete files in table `db.sample`. This selects position
delete files in partitions where 2 or more position delete files need to be
rewritten based on size criteria. Dangling deletes are removed from rewritten
delete files.
+```sql
+CALL catalog_name.system.rewrite_position_delete_files(table => 'db.sample',
options => map('min-input-files','2'))
+```
+
## Table migration
The `snapshot` and `migrate` procedures help test and migrate existing Hive or
Spark tables to Iceberg.