zhongyujiang commented on pull request #4117:
URL: https://github.com/apache/iceberg/pull/4117#issuecomment-1042862294
> Assert that the snapshot's data file size is less than 2 does not change
any thing in my view.
Changing the assertion condition is not intended to solve the validation
problem when there are merged results actually, like you said, there could be
more than one checkpoint in streaming mode, but there is no guarantee that each
checkpoint contains exactly each partition's data. The situation could be like
this:
- checkpoint#1
- (1, 'aaa')
- checkpoint#2
- (1, 'bbb')
...
When results of ck1 and ck2 are not merged, then snapshot of ck1 would have
only 1 data file for partition `aaa` but 0 file for other partitions and
snapshot of ck2 is also similar, that's why I changed the assertition condition.
> To accomplish this goal, I think we can use the
[BoundedTestSource](https://github.com/apache/iceberg/blob/2d4b0ddc76fd47aa27ca4972d4f3a6f256921c58/flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java#L49)
to reimplement this unit test. About the BoundedTestSource,
[here](https://github.com/apache/iceberg/blob/2d4b0ddc76fd47aa27ca4972d4f3a6f256921c58/flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java#L156)
is a good example for how to producing multiple rows into a single checkpoint.
I also wanted to solve the problem by controlling the checkpoint in the
beginning but didn't figure a convenient way to do so. Using
`BoundedTestSource` seems like a feasible way, I'll try with it. @openinx
Thanks for your advice.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]