mxm opened a new pull request, #15147:
URL: https://github.com/apache/iceberg/pull/15147

   Under high load due to limited CI resources and parallel test execution, the 
test pipeline can run multiple Flink checkpoints before all records have been 
processed. This produces a slightly different but not semantically wrong 
outcome.
   
   In the below reported test failure, there won't be a positional delete 
because the Flink checkpoint cuts off the writer optimization to convert an 
equality delete to a positional delete.
   
   ```
   TestDynamicIcebergSink > testCommitsOncePerTableBranchAndCheckpoint() FAILED
       java.lang.AssertionError: 
       Expecting map:
         {"added-data-files"="1", "added-delete-files"="1", 
"added-equality-delete-files"="1", "added-equality-deletes"="1", 
"added-files-size"="1128", "added-records"="1", "changed-partition-count"="1", 
"engine-name"="flink", "engine-version"="2.1.0", 
"flink.job-id"="7a390dfdf54bc3d7028708e2aa46c54c", 
"flink.max-committed-checkpoint-id"="2", 
"flink.operator-id"="27651a86dc01e60882f6b181172fad4c", 
"iceberg-version"="Apache Iceberg aa63060 (commit 
aa630608f79d06c1c3929801fd7266b082170758)", "total-data-files"="5", 
"total-delete-files"="2", "total-equality-deletes"="2", 
"total-files-size"="4478", "total-position-deletes"="0", "total-records"="6"}
       to contain entries:
         ["total-equality-deletes"="1",
           "total-position-deletes"="1",
           "total-records"="6"]
       but the following map entries had different values:
         ["total-equality-deletes"="2" (expected: "1"), 
"total-position-deletes"="0" (expected: "1")]
           at 
org.apache.iceberg.flink.sink.dynamic.TestDynamicIcebergSink.testCommitsOncePerTableBranchAndCheckpoint(TestDynamicIcebergSink.java:1005)
   ```
   
   The important part of the test is to ensure comitting once per checkpoint 
which is asserted further below.
   
   This closes https://github.com/apache/iceberg/issues/15139.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to