dimas-b commented on code in PR #8382:
URL: https://github.com/apache/iceberg/pull/8382#discussion_r1324501616
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##########
@@ -111,23 +111,31 @@ public static TableMetadata
updateTableMetadataWithNessieSpecificProperties(
// Update the TableMetadata with the Content of NessieTableState.
Map<String, String> newProperties =
Maps.newHashMap(tableMetadata.properties());
newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY,
reference.getHash());
+
// To prevent accidental deletion of files that are still referenced by
other branches/tags,
- // setting GC_ENABLED to false. So that all Iceberg's gc operations like
expire_snapshots,
- // remove_orphan_files, drop_table with purge will fail with an error.
- // Nessie CLI will provide a reference aware GC functionality for the
expired/unreferenced
+ // setting GC_ENABLED to 'false' is recommended, so that all Iceberg's gc
operations like
+ // expire_snapshots, remove_orphan_files, drop_table with purge will fail
with an error.
+ // `nessie-gc` CLI provides a reference aware GC functionality for the
expired/unreferenced
// files.
- newProperties.put(TableProperties.GC_ENABLED, "false");
-
- boolean metadataCleanupEnabled =
- newProperties
-
.getOrDefault(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED, "false")
- .equalsIgnoreCase("true");
- if (metadataCleanupEnabled) {
- newProperties.put(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
"false");
+ // Advanced users may still want to use the simpler Iceberg GC tool iff
their Nessie Server
+ // contains only one branch (in which case the full Nessie history will be
reflected in the
+ // Iceberg sequence of snapshots).
+ boolean warn =
+ tableMetadata.propertyAsBoolean(
+ TableProperties.GC_ENABLED, TableProperties.GC_ENABLED_DEFAULT)
+ || tableMetadata.propertyAsBoolean(
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED,
+ TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT);
+
+ if (warn &&
!newProperties.containsKey(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY)) {
+ newProperties.put(NessieTableOperations.NESSIE_GC_WARNING_PROPERTY, "1");
LOG.warn(
- "Automatic table metadata files cleanup was requested, but disabled
because "
- + "the Nessie catalog can use historical metadata files from
other references. "
- + "Use the 'nessie-gc' tool for history-aware GC");
+ "Standard Iceberg property '{}' and/or '{}' are enabled on table
'{}' in NessieCatalog."
+ + " This may make data in historical Nessie commits
inaccessible."
Review Comment:
rephrased
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]