[ https://issues.apache.org/jira/browse/HUDI-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134635#comment-17134635 ]
liujinhui commented on HUDI-1007: --------------------------------- I think that starting from the latest offect can indeed solve this problem, but this will lose a lot of data that would have been consumed. I think adding a configuration option, the user can choose: 1. You can choose to throw an exception, the task fails, and manually modify --checkpoint for consumption (default) 2. Add a new configuration: set whether to consume from the latest point when this happens Do you think it is feasible? [~vinoth] > When earliestOffsets is greater than checkpoint, Hudi will not be able to > successfully consume data > --------------------------------------------------------------------------------------------------- > > Key: HUDI-1007 > URL: https://issues.apache.org/jira/browse/HUDI-1007 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer > Reporter: liujinhui > Assignee: liujinhui > Priority: Major > Fix For: 0.6.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Use deltastreamer to consume kafka, > When earliestOffsets is greater than checkpoint, Hudi will not be able to > successfully consume data > org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#checkupValidOffsets > boolean checkpointOffsetReseter = checkpointOffsets.entrySet().stream() > .anyMatch(offset -> offset.getValue() < > earliestOffsets.get(offset.getKey())); > return checkpointOffsetReseter ? earliestOffsets : checkpointOffsets; > Kafka data is continuously generated, which means that some data will > continue to expire. > When earliestOffsets is greater than checkpoint, earliestOffsets will be > taken. But at this moment, some data expired. In the end, consumption fails. > This process is an endless cycle. I can understand that this design may be to > avoid the loss of data, but it will lead to such a situation, I want to fix > this problem, I want to hear your opinion -- This message was sent by Atlassian Jira (v8.3.4#803005)