RussellSpitzer commented on a change in pull request #3375:
URL: https://github.com/apache/iceberg/pull/3375#discussion_r737537547
##########
File path: site/docs/spark-procedures.md
##########
@@ -240,6 +240,34 @@ Remove any files in the `tablelocation/data` folder which
are not known to the t
CALL catalog_name.system.remove_orphan_files(table => 'db.sample', location =>
'tablelocation/data')
```
+### `rewrite_data_files`
+
+Iceberg tracks each data file in a table. More data files leads to more
metadata stored in manifest files, and small data files causes an unnecessary
amount of metadata and less efficient queries from file open costs.
+
+Iceberg can compact data files in parallel using Spark with the
`rewriteDataFiles` action. This will combine small files into larger files to
reduce metadata overhead and runtime file open cost.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table` | ✔️ | string | Name of the table to update |
+| `strategy` | | string | Name of the strategy - binpack or sort |
Review comment:
Probably best to include the default here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]