[jira] [Commented] (PULSAR-7) Possibility of data loss due to asynchronous checkpointing in kinesis consumer

Priyath Gregory (Jira) Fri, 02 Oct 2020 19:37:47 -0700


    [ 
https://issues.apache.org/jira/browse/PULSAR-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206589#comment-17206589
 ]


Priyath Gregory commented on PULSAR-7:
--------------------------------------

This issue is tracked at [https://github.com/apache/pulsar/issues/8192]

> Possibility of data loss due to asynchronous checkpointing in kinesis consumer
> ------------------------------------------------------------------------------
>
>                 Key: PULSAR-7
>                 URL: https://issues.apache.org/jira/browse/PULSAR-7
>             Project: Pulsar
>          Issue Type: Bug
>            Reporter: Priyath Gregory
>            Priority: Major
>
> The KinesisRecordProcessor pushes each record to a blocking queue and 
> proceeds to checkpoint shard progress on every checkpoint interval. But at 
> the point of checkpointing, there is no guarantee of successful delivery of 
> pending records in the queue, which can lead to data loss in case of instance 
> failure.
> Source: 
> [https://github.com/apache/pulsar/blob/master/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/KinesisRecordProcessor.java]
> The fix should ideally cover the following scenarios as far as I can see:
> 1. Periodic checkpointing at each checkpoint interval should be performed 
> after pending records are processed up until that point
> 2. Checkpoint when onShardEnded is invoked should be performed after pending 
> records are processed.
> 3. Checkpoint when shutdownRequested is invoked should be performed after 
> pending records are processed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PULSAR-7) Possibility of data loss due to asynchronous checkpointing in kinesis consumer

Reply via email to