[ 
https://issues.apache.org/jira/browse/KAFKA-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791509#comment-16791509
 ] 

Rajini Sivaram commented on KAFKA-8089:
---------------------------------------

We improved authentication failure handling under KIP-152 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-152+-+Improve+diagnostics+for+SASL+authentication+failures).
 At the time, the focus was on improving diagnostics for the most common cause 
of authentication failures, which was misconfigured security. For the first 
call in producers (wait for metadata), consumers (wait for coordinator) etc., 
we propagate any authentication failures without any retries. 

We didn't implement propagation for credential expiry at the time, but as 
described in this JIRA, it will be useful to do that as well. Expiring 
credentials from the main thread will result in authentication failures in the 
same way as invalid credentials since that is a common code path. But we need 
to implement failure propagation for background threads - consumer heartbeat 
thread,  producer sender thread etc.

> High level consumer from MirrorMaker is slow to deal with SSL certification 
> expiration
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8089
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8089
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 2.0.0
>            Reporter: Henry Cai
>            Assignee: Rajini Sivaram
>            Priority: Critical
>
> We have been using Kafka 2.0's mirror maker (which used High level consumer) 
> to do replication.  The topic is SSL enabled and the certificate will expire 
> at a random time within 12 hours.  When the certificate expired we will see 
> many SSL related exception in the log
>  
> [2019-03-07 18:02:54,128] ERROR [Consumer 
> clientId=kafkamirror-euw1-use1-m10nkafka03-1, 
> groupId=kafkamirror-euw1-use1-m10nkafka03] Connection to node 3005 failed 
> authentication due to: SSL handshake failed 
> (org.apache.kafka.clients.NetworkClient)
> This error will repeat for several hours.
> However even with the SSL error, the preexisting socket connection will still 
> work so the main fetching activities is actually not affected, but the 
> metadata operations from the client and the heartbeats from heartbeat thread 
> will be affected since they might open new socket connections.  I think those 
> errors are most likely originated from those side activities.
> The situation will last several hours until the main fetcher thread tried to 
> open a new connection (usually due to consumer rebalance) and then the SSL 
> Authentication exception will abort the operation and mirror maker will exit.
> During that several hours, the client wouldn't be able to get the latest 
> metadata and heartbeats also falters (we see rebalancing triggered because of 
> this).
> In NetworkClient.processDisconnection(), when the above method prints the 
> ERROR message, can it just throw the AuthenticationException up, this will 
> kill the KafkaConsumer.poll(), and this will speedup the certificate recycle 
> (in our case, we will restart the mirror maker with the new certificate)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to