nastra commented on code in PR #8382:
URL: https://github.com/apache/iceberg/pull/8382#discussion_r1324010665


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -58,12 +60,18 @@ public class NessieCatalog extends BaseMetastoreCatalog
   private static final Logger LOG = 
LoggerFactory.getLogger(NessieCatalog.class);
   private static final Joiner SLASH = Joiner.on("/");
   private static final String NAMESPACE_LOCATION_PROPS = "location";
+
+  private static final Map<String, String> DEFAULT_CATALOG_OPTIONS =
+      ImmutableMap.<String, String>builder()
+          .put(CatalogProperties.TABLE_DEFAULT_PREFIX + 
TableProperties.GC_ENABLED, "false")
+          .build();
+
   private NessieIcebergClient client;
   private String warehouseLocation;
   private Object config;
   private String name;
   private FileIO fileIO;
-  private Map<String, String> catalogOptions;
+  @Nonnull private Map<String, String> catalogOptions = 
DEFAULT_CATALOG_OPTIONS;

Review Comment:
   nit: I don't think `@Nonnull` is needed here and we typically don't use it 
throughout the codebase



##########
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieTable.java:
##########
@@ -589,18 +591,43 @@ public void testGCEnabled() {
   }
 
   @Test
-  public void testTableMetadataFilesCleanupDisable() throws 
NessieNotFoundException {
+  public void testGCEnabled() {
     Table icebergTable = catalog.loadTable(TABLE_IDENTIFIER);
+    icebergTable.updateProperties().set(TableProperties.GC_ENABLED, 
"true").commit();
+    
Assertions.assertThat(icebergTable.properties().get(TableProperties.GC_ENABLED))
+        .isEqualTo("true");
 
-    // Forceful setting of property also should get override with false
-    icebergTable
-        .updateProperties()
-        .set(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, "true")
-        .commit();
-    Assertions.assertThat(
-            
icebergTable.properties().get(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED))
-        .isNotNull()
-        .isEqualTo("false");
+    Assertions.assertThatCode(
+            () ->
+                
icebergTable.expireSnapshots().expireOlderThan(System.currentTimeMillis()).commit())
+        .doesNotThrowAnyException();
+  }
+
+  @Test
+  public void testGCEnabledViaCatalogProperties() {

Review Comment:
   ```suggestion
     public void testGCEnabledViaTableDefaultCatalogProperty() {
   ```



##########
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieTable.java:
##########
@@ -589,18 +591,43 @@ public void testGCEnabled() {
   }
 
   @Test
-  public void testTableMetadataFilesCleanupDisable() throws 
NessieNotFoundException {
+  public void testGCEnabled() {
     Table icebergTable = catalog.loadTable(TABLE_IDENTIFIER);
+    icebergTable.updateProperties().set(TableProperties.GC_ENABLED, 
"true").commit();
+    
Assertions.assertThat(icebergTable.properties().get(TableProperties.GC_ENABLED))

Review Comment:
   could you also please update the one in L581 to 
`Assertions.assertThat(icebergTable.properties())
           .containsEntry(TableProperties.GC_ENABLED, "false");`



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -102,6 +102,38 @@ private static String commitAuthor(Map<String, String> 
catalogOptions) {
         .orElseGet(() -> System.getProperty("user.name"));
   }
 
+  private static void checkAndUpdateGCProperties(
+      TableMetadata tableMetadata, Map<String, String> updatedProperties, 
String identifier) {
+    if (tableMetadata.propertyAsBoolean(
+        NessieTableOperations.NESSIE_GC_NO_WARNING_PROPERTY, false)) {
+      return;
+    }
+
+    // To prevent accidental deletion of files that are still referenced by 
other branches/tags,
+    // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc 
operations like
+    // expire_snapshots, remove_orphan_files, drop_table with purge will fail 
with an error.
+    // `nessie-gc` CLI provides a reference-aware GC functionality for the 
expired/unreferenced
+    // files.
+    // Advanced users may still want to use the simpler Iceberg GC tools iff 
their Nessie Server
+    // contains only one branch (in which case the full Nessie history will be 
reflected in the
+    // Iceberg sequence of snapshots).
+    if (tableMetadata.propertyAsBoolean(
+            TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
+        || tableMetadata.propertyAsBoolean(
+            TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
+            TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT)) {

Review Comment:
   I think nessie might want to use `false` explicitly here rather than rely on 
whatever the Iceberg default is? That's at least what the previous code was 
doing



##########
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieTable.java:
##########
@@ -589,18 +591,43 @@ public void testGCEnabled() {
   }
 
   @Test
-  public void testTableMetadataFilesCleanupDisable() throws 
NessieNotFoundException {
+  public void testGCEnabled() {
     Table icebergTable = catalog.loadTable(TABLE_IDENTIFIER);
+    icebergTable.updateProperties().set(TableProperties.GC_ENABLED, 
"true").commit();
+    
Assertions.assertThat(icebergTable.properties().get(TableProperties.GC_ENABLED))

Review Comment:
   
`Assertions.assertThat(icebergTable.properties()).containsEntry(TableProperties.GC_ENABLED,
 "true");` is usually favorable, as it will print the content of the map if the 
check ever fails



##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -102,6 +102,38 @@ private static String commitAuthor(Map<String, String> 
catalogOptions) {
         .orElseGet(() -> System.getProperty("user.name"));
   }
 
+  private static void checkAndUpdateGCProperties(
+      TableMetadata tableMetadata, Map<String, String> updatedProperties, 
String identifier) {
+    if (tableMetadata.propertyAsBoolean(
+        NessieTableOperations.NESSIE_GC_NO_WARNING_PROPERTY, false)) {
+      return;
+    }
+
+    // To prevent accidental deletion of files that are still referenced by 
other branches/tags,
+    // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc 
operations like
+    // expire_snapshots, remove_orphan_files, drop_table with purge will fail 
with an error.
+    // `nessie-gc` CLI provides a reference-aware GC functionality for the 
expired/unreferenced
+    // files.
+    // Advanced users may still want to use the simpler Iceberg GC tools iff 
their Nessie Server
+    // contains only one branch (in which case the full Nessie history will be 
reflected in the
+    // Iceberg sequence of snapshots).
+    if (tableMetadata.propertyAsBoolean(
+            TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)

Review Comment:
   does nessie really want to default to gc enabled here if the flag isn't set 
for some reason?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to