jihoonson opened a new issue #6124: KafkaIndexTask can delete published 
segments on restart
URL: https://github.com/apache/incubator-druid/issues/6124
 
 
   This can happen in the following scenario.
   
   1. A kafka index task starts publishing segments.
   2. The task succeeds to publish segments and is stopped immediately (by 
restarting the machine).
   3. When the task is restored, it restores all sequences it kept in memory 
before restarting.
   4. After reading some more events from Kafka, the task tries to publish 
segments. These segments include the ones which were published before 
restarting because the restored sequences contain them.
   5. Since the segments which are published twice are already stored in 
metastore, the publish fails.
   6. The set of published segments in metastore is different from the set of 
segments the task is trying because the task read more data.
   7. The task thinks that the publish actually failed and removes the 
published segments from deep storage.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to