[
https://issues.apache.org/jira/browse/PULSAR-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206589#comment-17206589
]
Priyath Gregory commented on PULSAR-7:
--------------------------------------
This issue is tracked at [https://github.com/apache/pulsar/issues/8192]
> Possibility of data loss due to asynchronous checkpointing in kinesis consumer
> ------------------------------------------------------------------------------
>
> Key: PULSAR-7
> URL: https://issues.apache.org/jira/browse/PULSAR-7
> Project: Pulsar
> Issue Type: Bug
> Reporter: Priyath Gregory
> Priority: Major
>
> The KinesisRecordProcessor pushes each record to a blocking queue and
> proceeds to checkpoint shard progress on every checkpoint interval. But at
> the point of checkpointing, there is no guarantee of successful delivery of
> pending records in the queue, which can lead to data loss in case of instance
> failure.
> Source:
> [https://github.com/apache/pulsar/blob/master/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/KinesisRecordProcessor.java]
> The fix should ideally cover the following scenarios as far as I can see:
> 1. Periodic checkpointing at each checkpoint interval should be performed
> after pending records are processed up until that point
> 2. Checkpoint when onShardEnded is invoked should be performed after pending
> records are processed.
> 3. Checkpoint when shutdownRequested is invoked should be performed after
> pending records are processed.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)