stevenzwu edited a comment on pull request #4189:
URL: https://github.com/apache/iceberg/pull/4189#issuecomment-1048429340


   @openinx I have run the test hundreds of times locally in a test loop like 
you did before and was never able to reproduce it.
   
   Think again about the root cause that we discussed in the issue where we may 
miss the notifyCheckpointComplete callback. I am not sure this change to 
BoundedTestSource will help. It is not about how many rows in one checkpoint 
cycle. The real issue is that two checkpoint cycles got squashed into one 
Iceberg commit and hence have 2 files for a partition in one Iceberg commit. 
With that assumption, we can never assert on file count.
   
   The assertion is to verify that rows with the same partition value is only 
written by a single writer task. It is a little hacky/fragile. Maybe we can 
leveraging the naming convention of the data file (with subtaskId part).
   
   ```
     private String generateFilename() {
       return format.addExtension(
           String.format("%05d-%d-%s-%05d", partitionId, taskId, operationId, 
fileCount.incrementAndGet()));
     }
   ``` 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to