jackye1995 commented on code in PR #4423:
URL: https://github.com/apache/iceberg/pull/4423#discussion_r897501883


##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -273,13 +278,24 @@ private boolean isGlueIcebergTable(Table table) {
   public boolean dropTable(TableIdentifier identifier, boolean purge) {
     try {
       TableOperations ops = newTableOps(identifier);
-      TableMetadata lastMetadata = ops.current();
+
+      GlueTableOperations glueOps = (GlueTableOperations) ops;
+      S3FileIO s3FileIO = (S3FileIO) glueOps.io();
+      TableMetadata lastMetadata = null;
+      boolean isTablePurged = isTablePurged(identifier, s3FileIO.client());
+      if (!isTablePurged) {
+        lastMetadata = ops.current();
+      }
+
       glue.deleteTable(DeleteTableRequest.builder()
           .catalogId(awsProperties.glueCatalogId())
           .databaseName(IcebergToGlueConverter.getDatabaseName(identifier))
           .name(identifier.name())
           .build());
       LOG.info("Successfully dropped table {} from Glue", identifier);
+      ValidationException.check(!purge || 
!awsProperties.glueLakeFormationEnabled() || isTablePurged,
+          "Cannot purge table with LakeFormation enabled because S3 access is 
lost after table is dropped. " +

Review Comment:
   > I think we still want the glue table to be dropped even LF enabled and 
purge is set to true?
   
   Let's think about this again...
   
   What we know is that if table is LF enabled and purge is true, there is no 
way we can make this happen safely, because:
   1. if we drop table first, we can pre-fetch s3 credential but that might not 
be enough time to remove all the table data files
   2. if we remove all the table data files first, we might fail in the middle, 
this causes orphan files. 
   
   Originally I suggested that because there is no perfect way, let's just ask 
customer to purge by themselves and we validate and not allow them to purge in 
Iceberg if LF is enabled. But now I think again, this is known behavior even 
for other normal tables, and it looks like we should go with 2, so that we 
first remove table data files, all the failures are already suppressed in the 
operation, and then we drop the table.
   
   + @amogh-jahagirdar any thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to