szehon-ho commented on code in PR #7589:
URL: https://github.com/apache/iceberg/pull/7589#discussion_r1199488808


##########
docs/spark-procedures.md:
##########
@@ -364,6 +364,53 @@ Rewrite the manifests in table `db.sample` and disable the 
use of Spark caching.
 CALL catalog_name.system.rewrite_manifests('db.sample', false)
 ```
 
+### `rewrite_position_delete_files`
+
+Iceberg can rewrite position delete files, which serves two purposes:
+* Minor Compaction: Compact small position delete files into larger ones.  
This reduces size of metadata stored in manifest files and overhead of opening 
small delete files.
+* Remove Dangling Deletes: Filter out position delete records that refer to 
data files that are no longer live.  After rewrite_data_files, position delete 
records pointing to the rewritten data files are not immediately marked for 
removeal and remain tracked by the table's live snapshot metadata.  This is 
known as the 'dangling delete' problem, and is because a single position delete 
file can apply to more than one data file, and not all applicable data files 
are removed during rewrite.
+
+Iceberg can rewrite position delete files in parallel using Spark with the 
`rewritePositionDeletes` action.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Name of the table to update |
+| `options`     | ️   | map<string, string> | Options to be used for actions|
+
+See the [`SizeBasedFileRewriter` Javadoc](../../../javadoc/{{% icebergVersion 
%}}/org/apache/iceberg/actions/SizeBasedFileRewriter.html#field.summary),
+for list of all the supported options for this action.
+
+All rewritten position delete files are filtered to remove dangling deletes.
+
+#### Output
+
+| Output Name                    | Type | Description                          
                                       |
+|--------------------------------|------|-----------------------------------------------------------------------------|
+| `rewritten_delete_files_count` | int  | Number of delete files which were 
removed by this command                   |
+| `rewritten_bytes_count`        | long | Count of bytes across delete files 
which were removed by this command       |
+| `rewritten_delete_files_count` | int  | Number of delete files which were 
added  by this command                    |
+| `added_delete_files_count`     | long | Count of bytes across all new delete 
files which were added by this command |
+
+
+#### Examples
+
+Rewrite position delete files in table `db.sample` with default options.  This 
rewrites position delete files that conform to the default count-per-partition 
and size thresholds, and rewrite them using the default target sizes.  Dangling 
deletes are removed from rewritten delete files.

Review Comment:
   Yea the option name is `target-file-size-bytes`.  I added it to description. 
 We should probably document these options properly instead of a softlink to 
the javadoc, but I left it as it is in line with rewrite_data_files.  We can do 
this in a later pr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to