SteNicholas commented on PR #8884: URL: https://github.com/apache/hudi/pull/8884#issuecomment-1576388603
@zhuanshenbsj1, IMO, streaming read must skip clustering instants because there are many duplicates case. For example, the timeline is commit1, replacecommit1.requested,commit2 and there is a job start reading from commit1. At this time, the job fails and restarts from checkpoint and the replacecommit1 is completed. After restarting, the IncrementInputSplits will read the instants including commit1, replacecommit1 and commit2, which is different from the instants before failing including commit1 and commit2. cc @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
