zhongyujiang commented on a change in pull request #4117:
URL: https://github.com/apache/iceberg/pull/4117#discussion_r808753242



##########
File path: 
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkTableSink.java
##########
@@ -277,15 +277,15 @@ public void testHashDistributeMode() throws Exception {
       ));
 
       // Sometimes we will have more than one checkpoint if we pass the auto 
checkpoint interval,
-      // thus producing multiple snapshots.  Here we assert that each snapshot 
has only 1 file per partition.
+      // thus producing multiple snapshots.  Here we assert that each snapshot 
has no more than 1 file per partition.
       Map<Long, List<DataFile>> snapshotToDataFiles = 
SimpleDataUtil.snapshotToDataFiles(table);
       for (List<DataFile> dataFiles : snapshotToDataFiles.values()) {
-        Assert.assertEquals("There should be 1 data file in partition 'aaa'", 
1,
-            SimpleDataUtil.matchingPartitions(dataFiles, table.spec(), 
ImmutableMap.of("data", "aaa")).size());
-        Assert.assertEquals("There should be 1 data file in partition 'bbb'", 
1,
-            SimpleDataUtil.matchingPartitions(dataFiles, table.spec(), 
ImmutableMap.of("data", "bbb")).size());
-        Assert.assertEquals("There should be 1 data file in partition 'ccc'", 
1,
-            SimpleDataUtil.matchingPartitions(dataFiles, table.spec(), 
ImmutableMap.of("data", "ccc")).size());
+        Assert.assertTrue("There should be no more than 1 data file in 
partition 'aaa'",
+            SimpleDataUtil.matchingPartitions(dataFiles, table.spec(), 
ImmutableMap.of("data", "aaa")).size() < 2);

Review comment:
       I didn't figure out a way to validate hash distribution when there have 
merged results of multiple ckpts, originally I simply disabled this test 
running in streaming mode.
   This is an update for blue's comment, 
   
   > I think I would prefer a fix that avoids the root cause but still runs the 
test in streaming mode. I understand your concern about not being able to 
necessarily guarantee we won't have a flaky test, but we can probably set that 
high enough (1s?) that we don't see it in practice.
   
   I improved ck interval to 1000ms to reduce merge results possibility. And in 
streaming mode, I think the original assert is not right given the checkpoint 
scenario mentioned in my last comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to