[ 
https://issues.apache.org/jira/browse/KAFKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173295#comment-15173295
 ] 

Jun Rao commented on KAFKA-3310:
--------------------------------

[~aauradkar], that depends. In this case, the NPE is triggered directly when 
handling the fetch request in KafkaApis. The throttle time sensor is actually 
recorded before we add the request to the delay queue. So, we will send an 
empty fetch response with an unexpected error. However, the same NPE could be 
triggered when we try to complete a fetch request from the fetch purgatory. In 
this case, we won't even be able to send a fetch response. So the fetch request 
will timeout. What's worse is that there could be other fetch requests (both 
consumer and follower) in the fetch purgatory off the same key. Since we hit 
the unexpected exception while evaluating the completeness of this particular 
fetch request, we will skip the checking of other fetch requests on the same 
chain and therefore may delay other fetch requests.

It seems that this problem can show up pretty easily. Just upgrade the broker 
to 0.9.0, start a consumer, wait for more than an hour, then set the consumer 
quota. If the consumer fetch request is now throttled, we will hit the NPE.

Recording 0 on the throttled time sensor probably fixes most of the problem, 
but I am not sure if it fixes this completely. Since these two sensors are not 
updated at exactly the same time, it seems that it's still possible for 
throttled time sensor to expire before quota sensor?

> fetch requests can trigger repeated NPE when quota is enabled
> -------------------------------------------------------------
>
>                 Key: KAFKA-3310
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3310
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.1
>            Reporter: Jun Rao
>
> We saw the following NPE when consumer quota is enabled. NPE is triggered on 
> every fetch request from the client.
> java.lang.NullPointerException
>         at 
> kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:122)
>         at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$3(KafkaApis.scala:419)
>         at 
> kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436)
>         at 
> kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436)
>         at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481)
>         at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:69)
>         at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>         at java.lang.Thread.run(Thread.java:745)
> One possible cause of this is the logic of removing inactive sensors. 
> Currently, in ClientQuotaManager, we create two sensors per clientId: a 
> throttleTimeSensor and a quotaSensor. Each sensor expires if it's not 
> actively updated for 1 hour. What can happen is that initially, the quota is 
> not exceeded. So, quotaSensor is being updated actively, but 
> throttleTimeSensor is not. At some point, throttleTimeSensor is removed by 
> the expiring thread. Now, we are in a situation that quotaSensor is 
> registered, but throttleTimeSensor is not. Later on, if the quota is 
> exceeded, we will hit the above NPE when trying to update throttleTimeSensor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to