Alaksiej Ščarbaty created NIFI-15293:
----------------------------------------

             Summary: ConsumeKinesis can checkpoint not written records
                 Key: NIFI-15293
                 URL: https://issues.apache.org/jira/browse/NIFI-15293
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
    Affects Versions: 2.6.0
            Reporter: Alaksiej Ščarbaty
            Assignee: Alaksiej Ščarbaty


When checkpointing consumed Kinesis records, the processor uses checkpoint 
method ([code 
ref|https://github.com/apache/nifi/blob/main/nifi-extension-bundles/nifi-aws-bundle/nifi-aws-kinesis/src/main/java/org/apache/nifi/processors/aws/kinesis/MemoryBoundRecordBuffer.java#L647]).
 The checkpoint method without the arguments uses the latest *received* 
record's sequence number.

Due to the async nature of the processor it's possible that some records which 
were received from Kinesis haven't been written to FlowFiles yet and 
checkpointing them might result in data loss.

Sequence:
 # Receive a record with a sequence number 1.
 # Write it to a FlowFile.
 # While in step 2, receive a record with a sequence number 2.
 # On the FlowFile with a record 1 commit, checkpoint the sequence number.
 # The latest received record has sequence number 2, and this number will be 
checkpointed.
 # The processor shuts down abruptly, thus record 2 is never written.

The processor should checkpoint sequence numbers of records written to 
FlowFiles only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to