szehon-ho commented on code in PR #7589:
URL: https://github.com/apache/iceberg/pull/7589#discussion_r1199239910
##########
docs/spark-procedures.md:
##########
@@ -364,6 +364,53 @@ Rewrite the manifests in table `db.sample` and disable the
use of Spark caching.
CALL catalog_name.system.rewrite_manifests('db.sample', false)
```
+### `rewrite_position_delete_files`
+
+Iceberg can rewrite position delete files, which serves two purposes:
+* Minor Compaction: Compact small position delete files into larger ones.
This reduces size of metadata stored in manifest files and overhead of opening
small delete files.
+* Remove Dangling Deletes: Filter out position delete records that refer to
data files that are no longer live. After rewrite_data_files, position delete
records pointing to the rewritten data files are not immediately marked for
removeal and remain tracked by the table's live snapshot metadata. This is
known as the 'dangling delete' problem, and is because a single position delete
file can apply to more than one data file, and not all applicable data files
are removed during rewrite.
+
+Iceberg can rewrite position delete files in parallel using Spark with the
`rewritePositionDeletes` action.
Review Comment:
Sure, I'll remove this. I copied it from the rewrite data files procedure.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]