flyrain commented on code in PR #7589:
URL: https://github.com/apache/iceberg/pull/7589#discussion_r1199209402


##########
docs/spark-procedures.md:
##########
@@ -364,6 +364,53 @@ Rewrite the manifests in table `db.sample` and disable the 
use of Spark caching.
 CALL catalog_name.system.rewrite_manifests('db.sample', false)
 ```
 
+### `rewrite_position_delete_files`
+
+Iceberg can rewrite position delete files, which serves two purposes:
+* Minor Compaction: Compact small position delete files into larger ones.  
This reduces size of metadata stored in manifest files and overhead of opening 
small delete files.
+* Remove Dangling Deletes: Filter out position delete records that refer to 
data files that are no longer live.  After rewrite_data_files, position delete 
records pointing to the rewritten data files are not immediately marked for 
removeal and remain tracked by the table's live snapshot metadata.  This is 
known as the 'dangling delete' problem, and is because a single position delete 
file can apply to more than one data file, and not all applicable data files 
are removed during rewrite.
+
+Iceberg can rewrite position delete files in parallel using Spark with the 
`rewritePositionDeletes` action.

Review Comment:
   We may not need this line since this is a doc for procedure instead of 
action. Or we can mention the procedure like this:
   ```
   This procedure rewrites position delete files in parallel.
   ```
   Also if we want to emphasis "parallel", can we provide a bit more details 
about how it works, like parallel at partition level or file level?
   
   Or if we want to mention the `rewritePositionDeletes` action, we may clarify 
that this procedure is based upon the action.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to