[ https://issues.apache.org/jira/browse/FLINK-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262565#comment-16262565 ]
Kostas Kloudas commented on FLINK-8132: --------------------------------------- This is a bug or a blocker? > FlinkKafkaProducer011 can commit incorrect transaction during recovery > ---------------------------------------------------------------------- > > Key: FLINK-8132 > URL: https://issues.apache.org/jira/browse/FLINK-8132 > Project: Flink > Issue Type: Bug > Components: Kafka Connector > Reporter: Piotr Nowojski > Assignee: Piotr Nowojski > Priority: Blocker > Fix For: 1.4.0 > > > Faulty scenario with producer pool of 2. > 1. started transaction 1 with producerA, written record 42 > 2. checkpoint 1 triggered, pre committing txn1, started txn2 with producerB, > written record 43 > 3. checkpoint 1 completed, committing txn1, returning producerA to the pool > 4. checkpoint 2 triggered , committing txn2, started txn3 with producerA, > written record 44 > 5. crash.... > 6. recover to checkpoint 1, txn1 from producerA found to > "pendingCommitTransactions", attempting to recoverAndCommit(txn1) > 7. unfortunately txn1 and txn3 from the same producers are identical from > KafkaBroker perspective and thus txn3 is being committed > result is that both records 42 and 44 are committed. > Proposed solution is to postpone returning producers to the pool until we are > sure that previous checkpoint (for which given producer was used) will not be > used for recovery (at least one more checkpoint was completed). -- This message was sent by Atlassian JIRA (v6.4.14#64029)