openinx commented on pull request #4117: URL: https://github.com/apache/iceberg/pull/4117#issuecomment-1042718844
But I generally don't think the current fix is in the correct direction, there are my points: * Indeed, increasing the checkpoint interval from 400ms to 1000ms reduce the probability to encounter this assertion failure, But it does not resolve the underlying real problem. So I don't think it's right to increase the checkpoint interval. * Assert that the snapshot's data file size is less than 2 does not change any thing in my view. I think the real intention that we designed this unit test is: we want to ensure that there is only one generated data file in each given partition if we commit those rows in only one single deterministic iceberg transaction, once we enable the switch `write.distribution-mode=hash` in both flink streaming & batch jobs. The current root cause is: we cannot make it trigger only one checkpoint for the given 9 rows in the flink streaming sql job. So I think the correct direction is: make only one checkpoint to write those 9 rows and finally we still assert there is only one data file in each given partition. To accomplish this goal, I think we can use the [BoundedTestSource](https://github.com/apache/iceberg/blob/2d4b0ddc76fd47aa27ca4972d4f3a6f256921c58/flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java#L49) to reimplement this unit test. About the BoundedTestSource, [here](https://github.com/apache/iceberg/blob/2d4b0ddc76fd47aa27ca4972d4f3a6f256921c58/flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java#L156) is a good example for how to producing multiple rows into a single checkpoint. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
