Alaksiej Ščarbaty created NIFI-15293:
----------------------------------------
Summary: ConsumeKinesis can checkpoint not written records
Key: NIFI-15293
URL: https://issues.apache.org/jira/browse/NIFI-15293
Project: Apache NiFi
Issue Type: Bug
Components: Extensions
Affects Versions: 2.6.0
Reporter: Alaksiej Ščarbaty
Assignee: Alaksiej Ščarbaty
When checkpointing consumed Kinesis records, the processor uses checkpoint
method ([code
ref|https://github.com/apache/nifi/blob/main/nifi-extension-bundles/nifi-aws-bundle/nifi-aws-kinesis/src/main/java/org/apache/nifi/processors/aws/kinesis/MemoryBoundRecordBuffer.java#L647]).
The checkpoint method without the arguments uses the latest *received*
record's sequence number.
Due to the async nature of the processor it's possible that some records which
were received from Kinesis haven't been written to FlowFiles yet and
checkpointing them might result in data loss.
Sequence:
# Receive a record with a sequence number 1.
# Write it to a FlowFile.
# While in step 2, receive a record with a sequence number 2.
# On the FlowFile with a record 1 commit, checkpoint the sequence number.
# The latest received record has sequence number 2, and this number will be
checkpointed.
# The processor shuts down abruptly, thus record 2 is never written.
The processor should checkpoint sequence numbers of records written to
FlowFiles only.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)