[
https://issues.apache.org/jira/browse/FLINK-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262878#comment-16262878
]
ASF GitHub Bot commented on FLINK-8132:
---------------------------------------
GitHub user pnowojski opened a pull request:
https://github.com/apache/flink/pull/5053
[FLINK-8132][kafka] Re-initialize transactional KafkaProducer on each
checkpoint
Previously faulty scenario with producer pool of 2.
1. started transaction 1 with producerA, written record 42
2. checkpoint 1 triggered, pre committing txn1, started txn2 with
producerB, written record 43
3. checkpoint 1 completed, committing txn1, returning producerA to the pool
4. checkpoint 2 triggered , committing txn2, started txn3 with producerA,
written record 44
5. crash....
6. recover to checkpoint 1, txn1 from producerA found to
"pendingCommitTransactions", attempting to recoverAndCommit(txn1)
7. unfortunately txn1 and txn3 from the same producers are identical from
KafkaBroker perspective and thus txn3 is being committed
result is that both records 42 and 44 are committed.
With this fix, after re-initialization txn3 will have different
producerId/epoch counters compared to txn1.
## Verifying this change
This PR adds a test that was previously failing to cover for future
regressions.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (yes / **no**)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes /
**no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (**not applicable** / docs /
JavaDocs / not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pnowojski/flink f8132
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5053.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5053
----
commit 6f0f56ca25a12da513253d738d14d8516a808bd8
Author: Piotr Nowojski <[email protected]>
Date: 2017-11-22T10:37:48Z
[hotfix][kafka] Throw FlinkKafkaProducer011Exception with error codes
instead of generic Exception
commit b79a372d589df54f505857ed78e120d69d0ad50a
Author: Piotr Nowojski <[email protected]>
Date: 2017-11-22T14:53:08Z
[FLINK-8132][kafka] Re-initialize transactional KafkaProducer on each
checkpoint
Previously faulty scenario with producer pool of 2.
1. started transaction 1 with producerA, written record 42
2. checkpoint 1 triggered, pre committing txn1, started txn2 with
producerB, written record 43
3. checkpoint 1 completed, committing txn1, returning producerA to the pool
4. checkpoint 2 triggered , committing txn2, started txn3 with producerA,
written record 44
5. crash....
6. recover to checkpoint 1, txn1 from producerA found to
"pendingCommitTransactions", attempting to recoverAndCommit(txn1)
7. unfortunately txn1 and txn3 from the same producers are identical from
KafkaBroker perspective and thus txn3 is being committed
result is that both records 42 and 44 are committed.
With this fix, after re-initialization txn3 will have different
producerId/epoch counters compared to txn1.
commit a5938795c5ae0dd7022d07ee90620fb6a8ce14a9
Author: Piotr Nowojski <[email protected]>
Date: 2017-11-22T14:55:20Z
[hotfix][kafka] Remove unused method in kafka tests
----
> FlinkKafkaProducer011 can commit incorrect transaction during recovery
> ----------------------------------------------------------------------
>
> Key: FLINK-8132
> URL: https://issues.apache.org/jira/browse/FLINK-8132
> Project: Flink
> Issue Type: Bug
> Components: Kafka Connector
> Reporter: Piotr Nowojski
> Assignee: Piotr Nowojski
> Priority: Blocker
> Fix For: 1.4.0
>
>
> Faulty scenario with producer pool of 2.
> 1. started transaction 1 with producerA, written record 42
> 2. checkpoint 1 triggered, pre committing txn1, started txn2 with producerB,
> written record 43
> 3. checkpoint 1 completed, committing txn1, returning producerA to the pool
> 4. checkpoint 2 triggered , committing txn2, started txn3 with producerA,
> written record 44
> 5. crash....
> 6. recover to checkpoint 1, txn1 from producerA found to
> "pendingCommitTransactions", attempting to recoverAndCommit(txn1)
> 7. unfortunately txn1 and txn3 from the same producers are identical from
> KafkaBroker perspective and thus txn3 is being committed
> result is that both records 42 and 44 are committed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)