zhongyujiang commented on pull request #4117:
URL: https://github.com/apache/iceberg/pull/4117#issuecomment-1042862294


   > Assert that the snapshot's data file size is less than 2 does not change 
any thing in my view.
   
   Changing the assertion condition is not intended to solve the validation 
problem when there are merged results actually, like you said,  there could be 
more than one checkpoint in streaming mode, but there is no guarantee that each 
checkpoint contains exactly each partition's data. The situation could be like 
this:
   - checkpoint#1 
       - (1, 'aaa')
   - checkpoint#2
       - (1, 'bbb')
   ...
   
   When results of ck1 and ck2 are not merged, then snapshot of ck1 would have 
only 1 data file for partition `aaa` but 0 file for other partitions and 
snapshot of ck2 is also similar, that's why I changed the assertition condition.
   
   > To accomplish this goal, I think we can use the 
[BoundedTestSource](https://github.com/apache/iceberg/blob/2d4b0ddc76fd47aa27ca4972d4f3a6f256921c58/flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java#L49)
 to reimplement this unit test. About the BoundedTestSource, 
[here](https://github.com/apache/iceberg/blob/2d4b0ddc76fd47aa27ca4972d4f3a6f256921c58/flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java#L156)
 is a good example for how to producing multiple rows into a single checkpoint.
   
   I also wanted to solve the problem by controlling the checkpoint in the 
beginning but didn't figure a convenient way to do so. Using 
`BoundedTestSource` seems like a feasible way, I'll try with it. @openinx 
Thanks for your advice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to