[jira] [Commented] (FLINK-10455) Potential Kafka producer leak in case of failures

ASF GitHub Bot (JIRA) Thu, 01 Nov 2018 07:28:09 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671674#comment-16671674
 ]


ASF GitHub Bot commented on FLINK-10455:
----------------------------------------

azagrebin opened a new pull request #6989: [FLINK-10455][Kafka Tx] Close 
transactional producers in case of failure and termination
URL: https://github.com/apache/flink/pull/6989
 
 
   ## What is the purpose of the change
   
   This PR addresses the problem of potential leak of resources associated with 
unclosed Kafka transactional producers in case of commitment failure or task 
shutdown.
   
   ## Brief change log
   
     - always close producer even if commit fails in 
TwoPhaseCommitSinkFunction. notifyCheckpointComplete
     - close pending transactions in close method of Kafka Flink function in 
case of task shutdown
     - continue trying to commit other transactions in 
TwoPhaseCommitSinkFunction. notifyCheckpointComplete if any of them failed
   
   ## Verifying this change
   
   existing tests
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Potential Kafka producer leak in case of failures
> -------------------------------------------------
>
>                 Key: FLINK-10455
>                 URL: https://issues.apache.org/jira/browse/FLINK-10455
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector
>    Affects Versions: 1.5.2
>            Reporter: Nico Kruber
>            Assignee: Andrey Zagrebin
>            Priority: Major
>              Labels: pull-request-available
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we 
> may get an {{ProducerFencedException}}. Documentation around 
> {{ProducerFencedException}} explicitly states that we should close the 
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in 
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in 
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an 
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} 
> simply iterates over the {{pendingCommitTransactions}} which is not touched 
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources 
> from the previous attempt will still linger around.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10455) Potential Kafka producer leak in case of failures

Reply via email to