zhongyujiang commented on a change in pull request #4117:
URL: https://github.com/apache/iceberg/pull/4117#discussion_r808753242
##########
File path:
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkTableSink.java
##########
@@ -277,15 +277,15 @@ public void testHashDistributeMode() throws Exception {
));
// Sometimes we will have more than one checkpoint if we pass the auto
checkpoint interval,
- // thus producing multiple snapshots. Here we assert that each snapshot
has only 1 file per partition.
+ // thus producing multiple snapshots. Here we assert that each snapshot
has no more than 1 file per partition.
Map<Long, List<DataFile>> snapshotToDataFiles =
SimpleDataUtil.snapshotToDataFiles(table);
for (List<DataFile> dataFiles : snapshotToDataFiles.values()) {
- Assert.assertEquals("There should be 1 data file in partition 'aaa'",
1,
- SimpleDataUtil.matchingPartitions(dataFiles, table.spec(),
ImmutableMap.of("data", "aaa")).size());
- Assert.assertEquals("There should be 1 data file in partition 'bbb'",
1,
- SimpleDataUtil.matchingPartitions(dataFiles, table.spec(),
ImmutableMap.of("data", "bbb")).size());
- Assert.assertEquals("There should be 1 data file in partition 'ccc'",
1,
- SimpleDataUtil.matchingPartitions(dataFiles, table.spec(),
ImmutableMap.of("data", "ccc")).size());
+ Assert.assertTrue("There should be no more than 1 data file in
partition 'aaa'",
+ SimpleDataUtil.matchingPartitions(dataFiles, table.spec(),
ImmutableMap.of("data", "aaa")).size() < 2);
Review comment:
I didn't figure out a way to validate hash distribution when there have
merged results of multiple ckpts, originally I simply disabled this test
running in streaming mode.
This is an update for blue's comment,
> I think I would prefer a fix that avoids the root cause but still runs the
test in streaming mode. I understand your concern about not being able to
necessarily guarantee we won't have a flaky test, but we can probably set that
high enough (1s?) that we don't see it in practice.
I improved ck interval to 1000ms to reduce merge results possibility. And in
streaming mode, I think the original assert is not right given the checkpoint
scenario mentioned in my last comment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]