szehon-ho opened a new pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123


   Tries to fix #4090 
   
   (Copied from my comment there)
   
   From the log, the expected exception seems still a validationException, just 
not from the distributed Spark delta-writer job as is expected but rather from 
the non-distributed metadata-only version.  Ref:
   
   Expected: an instance of org.apache.spark.SparkException
        but: <java.lang.IllegalArgumentException: Failed to cleanly delete data 
files matching: ref(name="id") == 1> is a java.lang.IllegalArgumentException
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   
   That's from here (the metadata-only version):
   
https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L271
   
   Although most of the time it will hit the distributed delta delete, there is 
a chance here it will fall to this metadata version. This is if the optimizer 
(OptimizeMetadataOnlyDeleteFromTable in particular) believes that the delete 
can be handled with just metadata. The appendFuture tries to avert this by 
constructing a file of two elements (1,2) and each time the deleteFuture hits 
only (1) so it decides it cannot use metadata, but if the appendTable has not 
run yet then the table is empty the deleteFuture decides it proceed with 
metadata-only delete.  In most of those times, that delete goes through fine as 
there is nothing to do and the test gets another try to get the right 
exception, but a very smaller percentage of time it may find the appendFuture 
has landed right before the commit and then fails in the above code path.
   
   One fix is to try to add expected checks this potential failure. Another fix 
is to make sure that there is some pre-existing data to force it always to use 
the distributed delta delete, which is done here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to