zhongyujiang commented on PR #10567:
URL: https://github.com/apache/iceberg/pull/10567#issuecomment-2295217325
> What if we test the reader operator recovery by using
OneInputStreamOperatorTestHarness? We can store the state of the operator, and
later use it to restore the reader
Hi @pvary sorry for the late reply.
I found that I cannot directly use `OneInputStreamOperatorTestHarness` for
testing. I noticed that we currently use it to test the
`StreamingReaderOperator`, but the input for `StreamingReaderOperato`r is
`FlinkInputSplit` (corresponding to a `CombinedScanTask`), and it processes at
least one completed split each time. This means it does not perform checkpoints
while consuming partial splits.
I looked into whether there might be other utility classes that would allow
me to write an end-to-end unit test, but I couldn't find any. Do you have any
other suggestions?
Additionally, I added a test `testDataIteratorWithResidualFilter` in the new
commit, where I correlate the values in the records with their respective files
order in the task, to validate that the `fileOffset` in `DataIterator` might be
incorrect when reading records (hence, the checkpoint would also record
incorrect information). I believe this also proves the error present in
`DataIterator`. Could you please take a look? Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]