zhongyujiang commented on PR #10567:
URL: https://github.com/apache/iceberg/pull/10567#issuecomment-2295217325

   > What if we test the reader operator recovery by using 
OneInputStreamOperatorTestHarness? We can store the state of the operator, and 
later use it to restore the reader
   
    Hi @pvary sorry for the late reply.
   
   I found that I cannot directly use `OneInputStreamOperatorTestHarness` for 
testing. I noticed that we currently use it to test the 
`StreamingReaderOperator`, but the input for `StreamingReaderOperato`r is 
`FlinkInputSplit` (corresponding to a `CombinedScanTask`), and it processes at 
least one completed split each time. This means it does not perform checkpoints 
while consuming partial splits.
   
   I looked into whether there might be other utility classes that would allow 
me to write an end-to-end unit test, but I couldn't find any. Do you have any 
other suggestions?
   
   Additionally, I added a test `testDataIteratorWithResidualFilter` in the new 
commit, where I correlate the values in the records with their respective files 
order in the task, to validate that the `fileOffset` in `DataIterator` might be 
incorrect when reading records (hence, the checkpoint would also record 
incorrect information). I believe this also proves the error present in 
`DataIterator`. Could you please take a look? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to