amogh-jahagirdar commented on code in PR #4423:
URL: https://github.com/apache/iceberg/pull/4423#discussion_r903236482
##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -273,13 +278,24 @@ private boolean isGlueIcebergTable(Table table) {
public boolean dropTable(TableIdentifier identifier, boolean purge) {
try {
TableOperations ops = newTableOps(identifier);
- TableMetadata lastMetadata = ops.current();
+
+ GlueTableOperations glueOps = (GlueTableOperations) ops;
+ S3FileIO s3FileIO = (S3FileIO) glueOps.io();
+ TableMetadata lastMetadata = null;
+ boolean isTablePurged = isTablePurged(identifier, s3FileIO.client());
+ if (!isTablePurged) {
+ lastMetadata = ops.current();
+ }
+
glue.deleteTable(DeleteTableRequest.builder()
.catalogId(awsProperties.glueCatalogId())
.databaseName(IcebergToGlueConverter.getDatabaseName(identifier))
.name(identifier.name())
.build());
LOG.info("Successfully dropped table {} from Glue", identifier);
+ ValidationException.check(!purge ||
!awsProperties.glueLakeFormationEnabled() || isTablePurged,
+ "Cannot purge table with LakeFormation enabled because S3 access is
lost after table is dropped. " +
Review Comment:
Sorry for the late reply, but yeah Pre-fetching credentials and deleting the
table is too brittle because those credentials would be invalidated. However,
the case I am thinking of is if we do a best effort deletion of the files
first, if there's concurrent operations which don't conflict (like append),
those could succeed. Then if dropping the table after fails, then the table is
in a bad state. To address this there can be a DROP_IN_PROGRESS STATE
introduced which is set first and then used when committing to prevent commits
against a table being deleted. I think any attempt of deletion of files should
only happen when it's impossible to modify the table further.
Let me know if I misunderstood the problem though, I'll think more on it but
I think 1 certainly can be ruled out.
##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -273,13 +278,24 @@ private boolean isGlueIcebergTable(Table table) {
public boolean dropTable(TableIdentifier identifier, boolean purge) {
try {
TableOperations ops = newTableOps(identifier);
- TableMetadata lastMetadata = ops.current();
+
+ GlueTableOperations glueOps = (GlueTableOperations) ops;
+ S3FileIO s3FileIO = (S3FileIO) glueOps.io();
+ TableMetadata lastMetadata = null;
+ boolean isTablePurged = isTablePurged(identifier, s3FileIO.client());
+ if (!isTablePurged) {
+ lastMetadata = ops.current();
+ }
+
glue.deleteTable(DeleteTableRequest.builder()
.catalogId(awsProperties.glueCatalogId())
.databaseName(IcebergToGlueConverter.getDatabaseName(identifier))
.name(identifier.name())
.build());
LOG.info("Successfully dropped table {} from Glue", identifier);
+ ValidationException.check(!purge ||
!awsProperties.glueLakeFormationEnabled() || isTablePurged,
+ "Cannot purge table with LakeFormation enabled because S3 access is
lost after table is dropped. " +
Review Comment:
Sorry for the late reply, but yeah Pre-fetching credentials and deleting the
table is too brittle because those credentials would be invalidated. However,
the case I am thinking of is if we do a best effort deletion of the files
first, if there's concurrent operations which don't conflict (like append),
those could succeed. Then if dropping the table after fails, then the table is
in a bad state. To address this there can be a DROP_IN_PROGRESS STATE
introduced which is set first and then used when committing to prevent commits
against a table being deleted. I think any attempt of deletion of files should
only happen when it's impossible to modify the table further.
@jackye1995 @xiaoxuandev Let me know if I misunderstood the problem though,
I'll think more on it but I think 1 certainly can be ruled out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]