szehon-ho opened a new pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123
Tries to fix #4090
(Copied from my comment there)
From the log, the expected exception seems still a validationException, just
not from the distributed Spark delta-writer job as is expected but rather from
the non-distributed metadata-only version. Ref:
Expected: an instance of org.apache.spark.SparkException
but: <java.lang.IllegalArgumentException: Failed to cleanly delete data
files matching: ref(name="id") == 1> is a java.lang.IllegalArgumentException
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
That's from here (the metadata-only version):
https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L271
Although most of the time it will hit the distributed delta delete, there is
a chance here it will fall to this metadata version. This is if the optimizer
(OptimizeMetadataOnlyDeleteFromTable in particular) believes that the delete
can be handled with just metadata. The appendFuture tries to avert this by
constructing a file of two elements (1,2) and each time the deleteFuture hits
only (1) so it decides it cannot use metadata, but if the appendTable has not
run yet then the table is empty the deleteFuture decides it proceed with
metadata-only delete. In most of those times, that delete goes through fine as
there is nothing to do and the test gets another try to get the right
exception, but a very smaller percentage of time it may find the appendFuture
has landed right before the commit and then fails in the above code path.
One fix is to try to add expected checks this potential failure. Another fix
is to make sure that there is some pre-existing data to force it always to use
the distributed delta delete, which is done here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]