[jira] [Updated] (FLINK-4080) Kinesis consumer not exactly-once if stopped in the middle of processing aggregated records

Tzu-Li (Gordon) Tai (JIRA) Wed, 15 Jun 2016 09:48:19 -0700

     [ 
https://issues.apache.org/jira/browse/FLINK-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tzu-Li (Gordon) Tai updated FLINK-4080:
---------------------------------------
    Description: 
I've occasionally experienced unsuccessful ManualExactlyOnceTest after several 
tries.

Kinesis records of the same aggregated batch will have the same sequence 
number, and different sub-sequence numbers 
(http://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-consumer-deaggregation.html).
 The current code of the consumer is committing state every time it finishes 
processing a record, even de-aggregated ones. This is a bug since this will 
incorrectly mark all remaining records of the de-aggregated batch as processed 
in the state.

Proposed fix: Use the extended `UserRecord` class in KCL to represent all 
records (either non- or de-aggregated) instead of the basic `Record` class. 
This gives access to whether or not the record was originally aggregated. If we 
encounter a de-aggregated record, don't update state until we finished 
processing the last record of the batch.

  was:
I've occasionally experienced unsuccessful ManualExactlyOnceTest after several 
tries.

Kinesis records of the same aggregated batch will have the same sequence 
number, and different sub-sequence numbers 
(http://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-consumer-deaggregation.html).
 The current state of the consumer is committing state every time it finishes 
processing a record, even de-aggregated ones. This is a bug since this will 
incorrectly mark all remaining records of the de-aggregated batch as processed 
in the state.

Proposed fix: Use the extended `UserRecord` class in KCL to represent all 
records (either non- or de-aggregated) instead of the basic `Record` class. 
This gives access to whether or not the record was originally aggregated. If we 
encounter a de-aggregated record, don't update state until we finished 
processing the last record of the batch.


> Kinesis consumer not exactly-once if stopped in the middle of processing 
> aggregated records
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-4080
>                 URL: https://issues.apache.org/jira/browse/FLINK-4080
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Kinesis Connector, Streaming Connectors
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> I've occasionally experienced unsuccessful ManualExactlyOnceTest after 
> several tries.
> Kinesis records of the same aggregated batch will have the same sequence 
> number, and different sub-sequence numbers 
> (http://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-consumer-deaggregation.html).
>  The current code of the consumer is committing state every time it finishes 
> processing a record, even de-aggregated ones. This is a bug since this will 
> incorrectly mark all remaining records of the de-aggregated batch as 
> processed in the state.
> Proposed fix: Use the extended `UserRecord` class in KCL to represent all 
> records (either non- or de-aggregated) instead of the basic `Record` class. 
> This gives access to whether or not the record was originally aggregated. If 
> we encounter a de-aggregated record, don't update state until we finished 
> processing the last record of the batch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-4080) Kinesis consumer not exactly-once if stopped in the middle of processing aggregated records

Reply via email to