openinx commented on pull request #4117:
URL: https://github.com/apache/iceberg/pull/4117#issuecomment-1042718844


   But I generally don't think the current fix is in the correct direction,  
there are my points: 
   * Indeed, increasing the checkpoint interval from 400ms to 1000ms reduce the 
probability to encounter this assertion failure, But it does not resolve the 
underlying real problem.  So I don't think it's right to increase the 
checkpoint interval.
   *  Assert that the snapshot's data file size is less than 2  does not change 
any thing in my view. 
   
   I think the real intention that we designed this unit test is:  we want to 
ensure that there is only one generated data file in each given partition if we 
commit those rows in only one single deterministic iceberg transaction,  once 
we enable the switch `write.distribution-mode=hash` in both flink streaming & 
batch jobs. 
   
   The current root cause is:  we cannot make it trigger only one checkpoint 
for the given 9 rows in the flink streaming sql job.  So I think the correct 
direction is:  make only one checkpoint to write those 9 rows and finally we 
still assert there is only one data file in each given partition.  To 
accomplish this goal, I think we can use the 
[BoundedTestSource](https://github.com/apache/iceberg/blob/2d4b0ddc76fd47aa27ca4972d4f3a6f256921c58/flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java#L49)
 to reimplement this unit test.  About the BoundedTestSource, 
[here](https://github.com/apache/iceberg/blob/2d4b0ddc76fd47aa27ca4972d4f3a6f256921c58/flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java#L156)
 is a good example for how to producing multiple rows into a single checkpoint. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to