garyli1019 commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r388705608
 
 

 ##########
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##########
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
               .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
       // Determine the offset ranges to read from
-      if (lastCheckpointStr.isPresent()) {
+      if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   I think this may potentially hide some concerning errors.  
   e.g. The delta streamer is consuming Kafka source, but a hidden bug happens 
and stored an empty checkpoint.  The next run will just ignore the empty 
checkpoint and reset to the `LATEST`. Then there will be data loss

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to