amogh-jahagirdar commented on code in PR #4423:
URL: https://github.com/apache/iceberg/pull/4423#discussion_r903236482


##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -273,13 +278,24 @@ private boolean isGlueIcebergTable(Table table) {
   public boolean dropTable(TableIdentifier identifier, boolean purge) {
     try {
       TableOperations ops = newTableOps(identifier);
-      TableMetadata lastMetadata = ops.current();
+
+      GlueTableOperations glueOps = (GlueTableOperations) ops;
+      S3FileIO s3FileIO = (S3FileIO) glueOps.io();
+      TableMetadata lastMetadata = null;
+      boolean isTablePurged = isTablePurged(identifier, s3FileIO.client());
+      if (!isTablePurged) {
+        lastMetadata = ops.current();
+      }
+
       glue.deleteTable(DeleteTableRequest.builder()
           .catalogId(awsProperties.glueCatalogId())
           .databaseName(IcebergToGlueConverter.getDatabaseName(identifier))
           .name(identifier.name())
           .build());
       LOG.info("Successfully dropped table {} from Glue", identifier);
+      ValidationException.check(!purge || 
!awsProperties.glueLakeFormationEnabled() || isTablePurged,
+          "Cannot purge table with LakeFormation enabled because S3 access is 
lost after table is dropped. " +

Review Comment:
   Sorry for the late reply, but yeah Pre-fetching credentials and deleting the 
table is too brittle because those credentials would be invalidated after 
deleting the table. However, the case I am thinking of is if we do a best 
effort deletion of the files first, if there's concurrent operations which 
don't conflict (like append), those could succeed. Then if dropping the table 
after fails, then the table is in a bad state. To address this there can be a 
DROP_IN_PROGRESS STATE introduced which is set first and then used when 
committing to prevent commits against a table being deleted. I think any 
attempt of deletion of files should only happen when it's impossible to modify 
the table further.
   
   @jackye1995  @xiaoxuandev Let me know if I misunderstood the problem though, 
I'll think more on it but I think 1 certainly can be ruled out. 



##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -273,13 +278,24 @@ private boolean isGlueIcebergTable(Table table) {
   public boolean dropTable(TableIdentifier identifier, boolean purge) {
     try {
       TableOperations ops = newTableOps(identifier);
-      TableMetadata lastMetadata = ops.current();
+
+      GlueTableOperations glueOps = (GlueTableOperations) ops;
+      S3FileIO s3FileIO = (S3FileIO) glueOps.io();
+      TableMetadata lastMetadata = null;
+      boolean isTablePurged = isTablePurged(identifier, s3FileIO.client());
+      if (!isTablePurged) {
+        lastMetadata = ops.current();
+      }
+
       glue.deleteTable(DeleteTableRequest.builder()
           .catalogId(awsProperties.glueCatalogId())
           .databaseName(IcebergToGlueConverter.getDatabaseName(identifier))
           .name(identifier.name())
           .build());
       LOG.info("Successfully dropped table {} from Glue", identifier);
+      ValidationException.check(!purge || 
!awsProperties.glueLakeFormationEnabled() || isTablePurged,
+          "Cannot purge table with LakeFormation enabled because S3 access is 
lost after table is dropped. " +

Review Comment:
   Sorry for the late reply, but yeah Pre-fetching credentials and deleting the 
table is too brittle because those credentials would be invalidated after 
deleting the table. However, for 2 the case I am thinking of is if we do a best 
effort deletion of the files first, if there's concurrent operations which 
don't conflict (like append), those could succeed. Then if dropping the table 
after fails, then the table is in a bad state. To address this there can be a 
DROP_IN_PROGRESS STATE introduced which is set first and then used when 
committing to prevent commits against a table being deleted. I think any 
attempt of deletion of files should only happen when it's impossible to modify 
the table further.
   
   @jackye1995  @xiaoxuandev Let me know if I misunderstood the problem though, 
I'll think more on it but I think 1 certainly can be ruled out. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to