gaborkaszab commented on code in PR #13432:
URL: https://github.com/apache/iceberg/pull/13432#discussion_r2224559891


##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,16 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. Old metadata files are kept 
for history by default, and are tracked in the `metadata-log` field of the 
metadata file. Tables with frequent commits, like those written by streaming 
jobs, may need to regularly clean metadata files to reduce metadata overhead.
 
-Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files.
+The number of metadata files being **tracked** is defined by 
`write.metadata.previous-versions-max` (default is 100).
 
-To automatically clean metadata files, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files (up to `write.metadata.previous-versions-max`) 
and will delete the oldest metadata file after each new one is created.
+To automatically delete older metadata files when they become untracked, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files as tracked (up to 
`write.metadata.previous-versions-max`), and will delete the oldest metadata 
file every time a new one is created.
+Alternatively, untracked metadata files can be deleted as part of [orphan file 
deletion](#delete-orphan-files).
 
-| Property                                     | Description                   
                                           |

Review Comment:
   I wouldn't get rid of this table. It give a nice visual help to see which 2 
configs are involved here, even though this could be considered duplicate 
information.



##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,16 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. Old metadata files are kept 
for history by default, and are tracked in the `metadata-log` field of the 
metadata file. Tables with frequent commits, like those written by streaming 
jobs, may need to regularly clean metadata files to reduce metadata overhead.

Review Comment:
   I'm not sure why the 2 paragraphs were merged. I found it more readable the 
original way. This is too much info squeezed into a single paragraph.
   
   The extensions to the sentences make sense for me.



##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,16 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. Old metadata files are kept 
for history by default, and are tracked in the `metadata-log` field of the 
metadata file. Tables with frequent commits, like those written by streaming 
jobs, may need to regularly clean metadata files to reduce metadata overhead.
 
-Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files.
+The number of metadata files being **tracked** is defined by 
`write.metadata.previous-versions-max` (default is 100).
 
-To automatically clean metadata files, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files (up to `write.metadata.previous-versions-max`) 
and will delete the oldest metadata file after each new one is created.
+To automatically delete older metadata files when they become untracked, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files as tracked (up to 
`write.metadata.previous-versions-max`), and will delete the oldest metadata 
file every time a new one is created.

Review Comment:
   `when they become untracked` I don't feel this addition is necessary. The 
below sentence clarifies that



##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,16 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. Old metadata files are kept 
for history by default, and are tracked in the `metadata-log` field of the 
metadata file. Tables with frequent commits, like those written by streaming 
jobs, may need to regularly clean metadata files to reduce metadata overhead.
 
-Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files.
+The number of metadata files being **tracked** is defined by 
`write.metadata.previous-versions-max` (default is 100).
 
-To automatically clean metadata files, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files (up to `write.metadata.previous-versions-max`) 
and will delete the oldest metadata file after each new one is created.
+To automatically delete older metadata files when they become untracked, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files as tracked (up to 
`write.metadata.previous-versions-max`), and will delete the oldest metadata 
file every time a new one is created.
+Alternatively, untracked metadata files can be deleted as part of [orphan file 
deletion](#delete-orphan-files).
 
-| Property                                     | Description                   
                                           |
-| -------------------------------------------- 
|--------------------------------------------------------------------------|
-| `write.metadata.delete-after-commit.enabled` | Whether to delete old 
**tracked** metadata files after each table commit |
-| `write.metadata.previous-versions-max`       | The number of old metadata 
files to keep                                 |
-
-Note that this will only delete metadata files that are **tracked** in the 
metadata log and will not delete orphaned metadata files.
-Example: With `write.metadata.delete-after-commit.enabled=false` and 
`write.metadata.previous-versions-max=10`, one will have 10 tracked metadata 
files and 90 orphaned metadata files after 100 commits.
-Configuring `write.metadata.delete-after-commit.enabled=true` and 
`write.metadata.previous-versions-max=20` will not automatically delete 
metadata files. Tracked metadata files would be deleted again when reaching 
`write.metadata.previous-versions-max=20`.

Review Comment:
   This was dropped, however I think it held useful information.



##########
docs/docs/maintenance.md:
##########
@@ -63,20 +63,16 @@ Expiring old snapshots removes them from metadata, so they 
are no longer availab
 
 ### Remove old metadata files
 
-Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity.
+Iceberg keeps track of table metadata using JSON files. Each change to a table 
produces a new metadata file to provide atomicity. Old metadata files are kept 
for history by default, and are tracked in the `metadata-log` field of the 
metadata file. Tables with frequent commits, like those written by streaming 
jobs, may need to regularly clean metadata files to reduce metadata overhead.
 
-Old metadata files are kept for history by default. Tables with frequent 
commits, like those written by streaming jobs, may need to regularly clean 
metadata files.
+The number of metadata files being **tracked** is defined by 
`write.metadata.previous-versions-max` (default is 100).
 
-To automatically clean metadata files, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files (up to `write.metadata.previous-versions-max`) 
and will delete the oldest metadata file after each new one is created.
+To automatically delete older metadata files when they become untracked, set 
`write.metadata.delete-after-commit.enabled=true` in table properties. This 
will keep some metadata files as tracked (up to 
`write.metadata.previous-versions-max`), and will delete the oldest metadata 
file every time a new one is created.
+Alternatively, untracked metadata files can be deleted as part of [orphan file 
deletion](#delete-orphan-files).

Review Comment:
   `Alternatively` This give the impression that there is an alternative 
approach to clean up tracked metadata files. However, this is a way to clean up 
untracked ones, so this might be misleading.



##########
docs/docs/configuration.md:
##########
@@ -74,8 +74,8 @@ Iceberg tables support table properties to configure table 
behavior, like the de
 | write.merge.distribution-mode                        | (not set)             
      | Defines distribution of write merge data                                
                                                                                
                                          |
 | write.wap.enabled                                    | false                 
      | Enables write-audit-publish writes                                      
                                                                                
                                          |
 | write.summary.partition-limit                        | 0                     
      | Includes partition-level summary stats in snapshot summaries if the 
changed partition count is less than this limit                                 
                                              |
-| write.metadata.delete-after-commit.enabled           | false                 
      | Controls whether to delete the oldest **tracked** version metadata 
files after commit                                                              
                                               |
-| write.metadata.previous-versions-max                 | 100                   
      | The max number of previous version metadata files to keep before 
deleting after commit                                                           
                                                 |
+| write.metadata.previous-versions-max                 | 100                   
      | The max number of previous version metadata files to track              
                                                                                
               |
+| write.metadata.delete-after-commit.enabled           | false                 
      | Controls whether to delete the oldest **tracked** version metadata 
files after each table commit. See the [Remove old metadata 
files](maintenance.md#remove-old-metadata-files) section for additional details 
                                                                                
                            |

Review Comment:
   Adding the link to the maintenance page is a great idea!



##########
docs/docs/configuration.md:
##########
@@ -74,8 +74,8 @@ Iceberg tables support table properties to configure table 
behavior, like the de
 | write.merge.distribution-mode                        | (not set)             
      | Defines distribution of write merge data                                
                                                                                
                                          |
 | write.wap.enabled                                    | false                 
      | Enables write-audit-publish writes                                      
                                                                                
                                          |
 | write.summary.partition-limit                        | 0                     
      | Includes partition-level summary stats in snapshot summaries if the 
changed partition count is less than this limit                                 
                                              |
-| write.metadata.delete-after-commit.enabled           | false                 
      | Controls whether to delete the oldest **tracked** version metadata 
files after commit                                                              
                                               |
-| write.metadata.previous-versions-max                 | 100                   
      | The max number of previous version metadata files to keep before 
deleting after commit                                                           
                                                 |
+| write.metadata.previous-versions-max                 | 100                   
      | The max number of previous version metadata files to track              
                                                                                
               |

Review Comment:
   I don't think the order of these 2 configs matter as long as they are right 
after each other. To keep the PR as clean as possible I'd keep the original 
ordering



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to