nastra commented on code in PR #8382:
URL: https://github.com/apache/iceberg/pull/8382#discussion_r1324010665
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##########
@@ -58,12 +60,18 @@ public class NessieCatalog extends BaseMetastoreCatalog
private static final Logger LOG =
LoggerFactory.getLogger(NessieCatalog.class);
private static final Joiner SLASH = Joiner.on("/");
private static final String NAMESPACE_LOCATION_PROPS = "location";
+
+ private static final Map<String, String> DEFAULT_CATALOG_OPTIONS =
+ ImmutableMap.<String, String>builder()
+ .put(CatalogProperties.TABLE_DEFAULT_PREFIX +
TableProperties.GC_ENABLED, "false")
+ .build();
+
private NessieIcebergClient client;
private String warehouseLocation;
private Object config;
private String name;
private FileIO fileIO;
- private Map<String, String> catalogOptions;
+ @Nonnull private Map<String, String> catalogOptions =
DEFAULT_CATALOG_OPTIONS;
Review Comment:
nit: I don't think `@Nonnull` is needed here and we typically don't use it
throughout the codebase
##########
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieTable.java:
##########
@@ -589,18 +591,43 @@ public void testGCEnabled() {
}
@Test
- public void testTableMetadataFilesCleanupDisable() throws
NessieNotFoundException {
+ public void testGCEnabled() {
Table icebergTable = catalog.loadTable(TABLE_IDENTIFIER);
+ icebergTable.updateProperties().set(TableProperties.GC_ENABLED,
"true").commit();
+
Assertions.assertThat(icebergTable.properties().get(TableProperties.GC_ENABLED))
+ .isEqualTo("true");
- // Forceful setting of property also should get override with false
- icebergTable
- .updateProperties()
- .set(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, "true")
- .commit();
- Assertions.assertThat(
-
icebergTable.properties().get(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED))
- .isNotNull()
- .isEqualTo("false");
+ Assertions.assertThatCode(
+ () ->
+
icebergTable.expireSnapshots().expireOlderThan(System.currentTimeMillis()).commit())
+ .doesNotThrowAnyException();
+ }
+
+ @Test
+ public void testGCEnabledViaCatalogProperties() {
Review Comment:
```suggestion
public void testGCEnabledViaTableDefaultCatalogProperty() {
```
##########
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieTable.java:
##########
@@ -589,18 +591,43 @@ public void testGCEnabled() {
}
@Test
- public void testTableMetadataFilesCleanupDisable() throws
NessieNotFoundException {
+ public void testGCEnabled() {
Table icebergTable = catalog.loadTable(TABLE_IDENTIFIER);
+ icebergTable.updateProperties().set(TableProperties.GC_ENABLED,
"true").commit();
+
Assertions.assertThat(icebergTable.properties().get(TableProperties.GC_ENABLED))
Review Comment:
could you also please update the one in L581 to
`Assertions.assertThat(icebergTable.properties())
.containsEntry(TableProperties.GC_ENABLED, "false");`
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -102,6 +102,38 @@ private static String commitAuthor(Map<String, String>
catalogOptions) {
.orElseGet(() -> System.getProperty("user.name"));
}
+ private static void checkAndUpdateGCProperties(
+ TableMetadata tableMetadata, Map<String, String> updatedProperties,
String identifier) {
+ if (tableMetadata.propertyAsBoolean(
+ NessieTableOperations.NESSIE_GC_NO_WARNING_PROPERTY, false)) {
+ return;
+ }
+
+ // To prevent accidental deletion of files that are still referenced by
other branches/tags,
+ // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc
operations like
+ // expire_snapshots, remove_orphan_files, drop_table with purge will fail
with an error.
+ // `nessie-gc` CLI provides a reference-aware GC functionality for the
expired/unreferenced
+ // files.
+ // Advanced users may still want to use the simpler Iceberg GC tools iff
their Nessie Server
+ // contains only one branch (in which case the full Nessie history will be
reflected in the
+ // Iceberg sequence of snapshots).
+ if (tableMetadata.propertyAsBoolean(
+ TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
+ || tableMetadata.propertyAsBoolean(
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT)) {
Review Comment:
I think nessie might want to use `false` explicitly here rather than rely on
whatever the Iceberg default is? That's at least what the previous code was
doing
##########
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieTable.java:
##########
@@ -589,18 +591,43 @@ public void testGCEnabled() {
}
@Test
- public void testTableMetadataFilesCleanupDisable() throws
NessieNotFoundException {
+ public void testGCEnabled() {
Table icebergTable = catalog.loadTable(TABLE_IDENTIFIER);
+ icebergTable.updateProperties().set(TableProperties.GC_ENABLED,
"true").commit();
+
Assertions.assertThat(icebergTable.properties().get(TableProperties.GC_ENABLED))
Review Comment:
`Assertions.assertThat(icebergTable.properties()).containsEntry(TableProperties.GC_ENABLED,
"true");` is usually favorable, as it will print the content of the map if the
check ever fails
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -102,6 +102,38 @@ private static String commitAuthor(Map<String, String>
catalogOptions) {
.orElseGet(() -> System.getProperty("user.name"));
}
+ private static void checkAndUpdateGCProperties(
+ TableMetadata tableMetadata, Map<String, String> updatedProperties,
String identifier) {
+ if (tableMetadata.propertyAsBoolean(
+ NessieTableOperations.NESSIE_GC_NO_WARNING_PROPERTY, false)) {
+ return;
+ }
+
+ // To prevent accidental deletion of files that are still referenced by
other branches/tags,
+ // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc
operations like
+ // expire_snapshots, remove_orphan_files, drop_table with purge will fail
with an error.
+ // `nessie-gc` CLI provides a reference-aware GC functionality for the
expired/unreferenced
+ // files.
+ // Advanced users may still want to use the simpler Iceberg GC tools iff
their Nessie Server
+ // contains only one branch (in which case the full Nessie history will be
reflected in the
+ // Iceberg sequence of snapshots).
+ if (tableMetadata.propertyAsBoolean(
+ TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
Review Comment:
does nessie really want to default to gc enabled here if the flag isn't set
for some reason?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]