pvary commented on code in PR #13082:
URL: https://github.com/apache/iceberg/pull/13082#discussion_r2095150379
##########
docs/docs/spark-procedures.md:
##########
@@ -389,24 +389,25 @@ Iceberg can compact data files in parallel using Spark
with the `rewriteDataFile
#### Options
##### General Options
-| Name | Default Value | Description |
-|------|---------------|-------------|
-| `max-concurrent-file-group-rewrites` | 5 | Maximum number of file groups to
be simultaneously rewritten |
-| `partial-progress.enabled` | false | Enable committing groups of files prior
to the entire rewrite completing |
-| `partial-progress.max-commits` | 10 | Maximum amount of commits that this
rewrite is allowed to produce if partial progress is enabled |
-| `partial-progress.max-failed-commits` | value of
`partital-progress.max-commits` | Maximum amount of failed commits allowed
before job failure, if partial progress is enabled |
-| `use-starting-sequence-number` | true | Use the sequence number of the
snapshot at compaction start time instead of that of the newly produced
snapshot |
-| `rewrite-job-order` | none | Force the rewrite job order based on the value.
<ul><li>If rewrite-job-order=bytes-asc, then rewrite the smallest job groups
first.</li><li>If rewrite-job-order=bytes-desc, then rewrite the largest job
groups first.</li><li>If rewrite-job-order=files-asc, then rewrite the job
groups with the least files first.</li><li>If rewrite-job-order=files-desc,
then rewrite the job groups with the most files first.</li><li>If
rewrite-job-order=none, then rewrite job groups in the order they were planned
(no specific ordering).</li></ul> |
-| `target-file-size-bytes` | 536870912 (512 MB, default value of
`write.target-file-size-bytes` from [table
properties](configuration.md#write-properties)) | Target output file size |
-| `min-file-size-bytes` | 75% of target file size | Files under this threshold
will be considered for rewriting regardless of any other criteria |
-| `max-file-size-bytes` | 180% of target file size | Files with sizes above
this threshold will be considered for rewriting regardless of any other
criteria |
-| `min-input-files` | 5 | Any file group exceeding this number of files will
be rewritten regardless of other criteria |
-| `rewrite-all` | false | Force rewriting of all provided files overriding
other options |
-| `max-file-group-size-bytes` | 107374182400 (100GB) | Largest amount of data
that should be rewritten in a single file group. The entire rewrite operation
is broken down into pieces based on partitioning and within partitions based on
size into file-groups. This helps with breaking down the rewriting of very
large partitions which may not be rewritable otherwise due to the resource
constraints of the cluster. |
-| `delete-file-threshold` | 2147483647 | Minimum number of deletes that needs
to be associated with a data file for it to be considered for rewriting |
-| `delete-ratio-threshold` | 0.3 | Minimum deletion ratio that needs to be
associated with a data file for it to be considered for rewriting |
-| `output-spec-id` | current partition spec id | Identifier of the output
partition spec. Data will be reorganized during the rewrite to align with the
output partitioning. |
-| `remove-dangling-deletes` | false | Remove dangling position and equality
deletes after rewriting. A delete file is considered dangling if it does not
apply to any live data files. Enabling this will generate an additional commit
for the removal. |
+| Name | Default Value
| Description
|
Review Comment:
I think most of the changes is just formatting. Could we revert the
formatting only changes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]