stevenzwu commented on pull request #4189:
URL: https://github.com/apache/iceberg/pull/4189#issuecomment-1048429340
@openinx I have run the test hundreds of times locally and was never able to
reproduce it.
Think again about the root cause that we discussed in the issue where we may
miss the notifyCheckpointComplete callback. I am not sure this change to
BoundedTestSource will help. It is not about how many rows in one checkpoint
cycle. The real issue is that two checkpoint cycles got squashed into one
Iceberg commit and hence have 2 files for a partition in one Iceberg commit.
With that assumption, we can never assert on file count.
The assertion is to verify that rows with the same partition value is only
written by a single writer task. It is a little hacky/fragile. Maybe we can
leveraging the naming convention of the data file (with subtaskId part).
```
private String generateFilename() {
return format.addExtension(
String.format("%05d-%d-%s-%05d", partitionId, taskId, operationId,
fileCount.incrementAndGet()));
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]