[
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671674#comment-16671674
]
ASF GitHub Bot commented on FLINK-10455:
----------------------------------------
azagrebin opened a new pull request #6989: [FLINK-10455][Kafka Tx] Close
transactional producers in case of failure and termination
URL: https://github.com/apache/flink/pull/6989
## What is the purpose of the change
This PR addresses the problem of potential leak of resources associated with
unclosed Kafka transactional producers in case of commitment failure or task
shutdown.
## Brief change log
- always close producer even if commit fails in
TwoPhaseCommitSinkFunction. notifyCheckpointComplete
- close pending transactions in close method of Kafka Flink function in
case of task shutdown
- continue trying to commit other transactions in
TwoPhaseCommitSinkFunction. notifyCheckpointComplete if any of them failed
## Verifying this change
existing tests
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (no)
- The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Potential Kafka producer leak in case of failures
> -------------------------------------------------
>
> Key: FLINK-10455
> URL: https://issues.apache.org/jira/browse/FLINK-10455
> Project: Flink
> Issue Type: Bug
> Components: Kafka Connector
> Affects Versions: 1.5.2
> Reporter: Nico Kruber
> Assignee: Andrey Zagrebin
> Priority: Major
> Labels: pull-request-available
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we
> may get an {{ProducerFencedException}}. Documentation around
> {{ProducerFencedException}} explicitly states that we should close the
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}}
> simply iterates over the {{pendingCommitTransactions}} which is not touched
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources
> from the previous attempt will still linger around.
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)