[ 
https://issues.apache.org/jira/browse/KAFKA-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890366#comment-17890366
 ] 

Stefan Huber commented on KAFKA-15796:
--------------------------------------

Hi, we are having the same issue. We need to connect to a broker which randomly 
encounters issues with authentication. With default config, each thread just 
stops working after it encounters an Authentication Exception. When we added 
authorizationExceptionRetryInterval to the config, failed authentications are 
retried, but we also noticed that suddenly we are running out of CPU after a 
few hours. When looking at a profiler, I can see that massive amount of time is 
being spent in KafkaConsumer.poll() function. We were somewhat able to limit 
the impact by tuning backoff settings and enabling restartAfterAuthExceptions. 
However, after a few days max we usually see high cpu load again and we need to 
restart our app manually. I am not very familiar with Kafka internals, but if 
there is any data I can provide that helps to fix this issue, please let me 
know.

> High CPU issue in Kafka Producer when Auth Failed 
> --------------------------------------------------
>
>                 Key: KAFKA-15796
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15796
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, producer 
>    Affects Versions: 3.2.2, 3.2.3, 3.3.1, 3.3.2, 3.5.0, 3.4.1, 3.6.0, 3.5.1
>            Reporter: xiaotong.wang
>            Priority: Major
>         Attachments: image-2023-11-07-14-18-32-016.png
>
>
> How to reproduce
> 1、kafka-client 3.x.x  Producer config  enable.idempotence=true  (this is 
> default)
> 2、start kafka server , not contain client user auth info
> 3、start client producer , after 3.x,producer will initProducerId and TCM 
> state trans to INITIALIZING
> 4、server reject client reqesut , producer will raise 
> AuthenticationException  
> (org.apache.kafka.clients.producer.internals.Sender#maybeSendAndPollTransactionalRequest)
> 5、kafka-client org.apache.kafka.clients.producer.internals.Sender#runOnce 
> catch
> AuthenticationException 
>       call transactionManager.authenticationFailed(e); 
>     
>      synchronized void authenticationFailed(AuthenticationException e)
> {           for (TxnRequestHandler request : pendingRequests)           
> request.fatalError(e);       }
>      this method only handle pendingRequest,but inflight request is missing 
> 6、 TCM state will alway in INITIALIZING
>       for judgment Condition: currentState != State.INITIALIZING && 
> !hasProducerId()
> 7、producer send mesasge , mesasge go into  batch queue,Sender will wake up 
> and set pollTimeout=0 , prepare to send message 
> 8、but , before Sender sendProducerData ,it will do message filter 
> ,RecordAccumulator drain 
> {-}{{-}}>drainBatchesForOneNode{{-}}{-}>shouldStopDrainBatchesForPartition 
>       when producerIdAndEpoch.isValid()==false,return true, it will not 
> collect any message 
> 9、now kafka producer network thread  CPU usage will go 100%
> 10、even we add user auth info and permission in kafka server ,it can not 
> self-healing
>  
>  
>  
> suggest : 
> also catch AuthenticationException  in  
> org.apache.kafka.clients.producer.internals.Sender#maybeSendAndPollTransactionalRequest
>   and respone failed to inflight InitProducerId request
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to