[
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann reopened FLINK-10455:
-----------------------------------
A user reported on the ML that he is seeing a {{ClassNotFoundException}}
similar to what Nico reported:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Class-loader-premature-closure-NoClassDefFoundError-org-apache-kafka-clients-NetworkClient-td27734.html.
I'm suspecting that we still have a Kafka producer leak somewhere.
> Potential Kafka producer leak in case of failures
> -------------------------------------------------
>
> Key: FLINK-10455
> URL: https://issues.apache.org/jira/browse/FLINK-10455
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Kafka
> Affects Versions: 1.5.2
> Reporter: Nico Kruber
> Assignee: Andrey Zagrebin
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.5.6, 1.6.3, 1.7.0
>
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we
> may get an {{ProducerFencedException}}. Documentation around
> {{ProducerFencedException}} explicitly states that we should close the
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}}
> simply iterates over the {{pendingCommitTransactions}} which is not touched
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources
> from the previous attempt will still linger around.
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)