snazy commented on code in PR #8382:
URL: https://github.com/apache/iceberg/pull/8382#discussion_r1324153854


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -111,23 +111,31 @@ public static TableMetadata 
updateTableMetadataWithNessieSpecificProperties(
     // Update the TableMetadata with the Content of NessieTableState.
     Map<String, String> newProperties = 
Maps.newHashMap(tableMetadata.properties());
     newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY, 
reference.getHash());
+
     // To prevent accidental deletion of files that are still referenced by 
other branches/tags,

Review Comment:
   Nit: wonder whether all the GC related warning code should better go into a 
separate method (it's quite some lines of code/comments)



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##########
@@ -55,6 +55,8 @@ public class NessieTableOperations extends 
BaseMetastoreTableOperations {
    */
   public static final String NESSIE_COMMIT_ID_PROPERTY = "nessie.commit.id";
 
+  public static final String NESSIE_GC_WARNING_PROPERTY = 
"nessie.gc.user.warned";

Review Comment:
   Should this maybe be something like `nessie.gc.no-warning`?



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -111,23 +111,31 @@ public static TableMetadata 
updateTableMetadataWithNessieSpecificProperties(
     // Update the TableMetadata with the Content of NessieTableState.
     Map<String, String> newProperties = 
Maps.newHashMap(tableMetadata.properties());
     newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY, 
reference.getHash());
+
     // To prevent accidental deletion of files that are still referenced by 
other branches/tags,
-    // setting GC_ENABLED to false. So that all Iceberg's gc operations like 
expire_snapshots,
-    // remove_orphan_files, drop_table with purge will fail with an error.
-    // Nessie CLI will provide a reference aware GC functionality for the 
expired/unreferenced
+    // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc 
operations like
+    // expire_snapshots, remove_orphan_files, drop_table with purge will fail 
with an error.
+    // `nessie-gc` CLI provides a reference aware GC functionality for the 
expired/unreferenced
     // files.
-    newProperties.put(TableProperties.GC_ENABLED, "false");
-
-    boolean metadataCleanupEnabled =
-        newProperties
-            
.getOrDefault(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, "false")
-            .equalsIgnoreCase("true");
-    if (metadataCleanupEnabled) {
-      newProperties.put(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, 
"false");
+    // Advanced users may still want to use the simpler Iceberg GC tool iff 
their Nessie Server
+    // contains only one branch (in which case the full Nessie history will be 
reflected in the
+    // Iceberg sequence of snapshots).
+    boolean warn =
+        tableMetadata.propertyAsBoolean(
+                TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
+            || tableMetadata.propertyAsBoolean(
+                TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
+                TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT);
+
+    if (warn && 
!newProperties.containsKey(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY)) {
+      newProperties.put(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY, "1");
       LOG.warn(
-          "Automatic table metadata files cleanup was requested, but disabled 
because "
-              + "the Nessie catalog can use historical metadata files from 
other references. "
-              + "Use the 'nessie-gc' tool for history-aware GC");
+          "Standard Iceberg property '{}' and/or '{}' are enabled on table 
'{}' in NessieCatalog."
+              + " This may make data in historical Nessie commits 
inaccessible."

Review Comment:
   ```suggestion
                 + " This likely makes data in other Nessie branches and tags 
and in earlier, historical Nessie commits inaccessible."
   ```



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -111,23 +111,31 @@ public static TableMetadata 
updateTableMetadataWithNessieSpecificProperties(
     // Update the TableMetadata with the Content of NessieTableState.
     Map<String, String> newProperties = 
Maps.newHashMap(tableMetadata.properties());
     newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY, 
reference.getHash());
+
     // To prevent accidental deletion of files that are still referenced by 
other branches/tags,
-    // setting GC_ENABLED to false. So that all Iceberg's gc operations like 
expire_snapshots,
-    // remove_orphan_files, drop_table with purge will fail with an error.
-    // Nessie CLI will provide a reference aware GC functionality for the 
expired/unreferenced
+    // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc 
operations like
+    // expire_snapshots, remove_orphan_files, drop_table with purge will fail 
with an error.
+    // `nessie-gc` CLI provides a reference aware GC functionality for the 
expired/unreferenced
     // files.
-    newProperties.put(TableProperties.GC_ENABLED, "false");
-
-    boolean metadataCleanupEnabled =
-        newProperties
-            
.getOrDefault(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, "false")
-            .equalsIgnoreCase("true");
-    if (metadataCleanupEnabled) {
-      newProperties.put(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, 
"false");
+    // Advanced users may still want to use the simpler Iceberg GC tool iff 
their Nessie Server
+    // contains only one branch (in which case the full Nessie history will be 
reflected in the
+    // Iceberg sequence of snapshots).
+    boolean warn =
+        tableMetadata.propertyAsBoolean(
+                TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
+            || tableMetadata.propertyAsBoolean(
+                TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
+                TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT);
+
+    if (warn && 
!newProperties.containsKey(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY)) {
+      newProperties.put(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY, "1");
       LOG.warn(
-          "Automatic table metadata files cleanup was requested, but disabled 
because "
-              + "the Nessie catalog can use historical metadata files from 
other references. "
-              + "Use the 'nessie-gc' tool for history-aware GC");
+          "Standard Iceberg property '{}' and/or '{}' are enabled on table 
'{}' in NessieCatalog."
+              + " This may make data in historical Nessie commits 
inaccessible."
+              + " Consider setting those properties to 'false' use the 
'nessie-gc' tool for history-aware GC.",

Review Comment:
   ```suggestion
                 + " The recommended setting for those properties is 'false', 
use the 'nessie-gc' tool for Nessie reference aware garbage collection.",
   ```



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -111,23 +111,31 @@ public static TableMetadata 
updateTableMetadataWithNessieSpecificProperties(
     // Update the TableMetadata with the Content of NessieTableState.
     Map<String, String> newProperties = 
Maps.newHashMap(tableMetadata.properties());
     newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY, 
reference.getHash());
+
     // To prevent accidental deletion of files that are still referenced by 
other branches/tags,
-    // setting GC_ENABLED to false. So that all Iceberg's gc operations like 
expire_snapshots,
-    // remove_orphan_files, drop_table with purge will fail with an error.
-    // Nessie CLI will provide a reference aware GC functionality for the 
expired/unreferenced
+    // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc 
operations like
+    // expire_snapshots, remove_orphan_files, drop_table with purge will fail 
with an error.
+    // `nessie-gc` CLI provides a reference aware GC functionality for the 
expired/unreferenced
     // files.
-    newProperties.put(TableProperties.GC_ENABLED, "false");
-
-    boolean metadataCleanupEnabled =
-        newProperties
-            
.getOrDefault(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, "false")
-            .equalsIgnoreCase("true");
-    if (metadataCleanupEnabled) {
-      newProperties.put(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, 
"false");
+    // Advanced users may still want to use the simpler Iceberg GC tool iff 
their Nessie Server
+    // contains only one branch (in which case the full Nessie history will be 
reflected in the
+    // Iceberg sequence of snapshots).
+    boolean warn =
+        tableMetadata.propertyAsBoolean(
+                TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
+            || tableMetadata.propertyAsBoolean(
+                TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
+                TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT);
+
+    if (warn && 
!newProperties.containsKey(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY)) {
+      newProperties.put(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY, "1");
       LOG.warn(
-          "Automatic table metadata files cleanup was requested, but disabled 
because "
-              + "the Nessie catalog can use historical metadata files from 
other references. "
-              + "Use the 'nessie-gc' tool for history-aware GC");
+          "Standard Iceberg property '{}' and/or '{}' are enabled on table 
'{}' in NessieCatalog."

Review Comment:
   ```suggestion
             "The Iceberg property '{}' and/or '{}' is enabled on table '{}' in 
NessieCatalog."
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to