[ 
https://issues.apache.org/jira/browse/KAFKA-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941474#comment-14941474
 ] 

Parth Brahmbhatt commented on KAFKA-2587:
-----------------------------------------

I looked at the code to reason around why this can happen. The state reported 
is indeed one of the valid states during our test 
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/test/scala/unit/kafka/security/auth/SimpleAclAuthorizerTest.scala#L217

After that line we actually remove all acls for that resource, add one acl back 
to it and remove that one acl. All those steps pass verification. 
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/test/scala/unit/kafka/security/auth/SimpleAclAuthorizerTest.scala#L225
 and 
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/test/scala/unit/kafka/security/auth/SimpleAclAuthorizerTest.scala#L226

Given we are using the same instance of the authorizer the cache of that 
instance is immediately updated for both add and remove. 
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala#L171
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala#L189

The only other place that can update the cache is notification handler as part 
of handling acl-changed notification. 
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala#L269

However in that case we read the data from zookeeper and then update the cache. 
If the notifications processing was delayed for some reason, it should still 
read the acls from zk and then update the cache. 
There are pathological cases that can lead to this failure , for example:
1) Notification handler starts, reads acls from zk and a thread switch happens 
before it can update the cache
2) All the other cache updates go through (remove resource, add the acl, remove 
the acl). 
3) Before verification finishes for the last "remove one acl" a thread switch 
happens and notification handler update the cache with stale acls that it read 
before. 

Even with this case there should be follow up notifications about adding an acl 
and removing an acl which should again cause the notification process to read 
state from zookeeper and update the cache to correct state. Plus this seems 
unlikely enough that it would not happen every other day.

I will continue to look into this. In the meantime if this is a continuous dev 
pain, we can remove the last 3 lines of test that removes the last acl and 
tries to verify that the zookeeper path is deleted. 

> Transient test failure: `SimpleAclAuthorizerTest`
> -------------------------------------------------
>
>                 Key: KAFKA-2587
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2587
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Ismael Juma
>            Assignee: Parth Brahmbhatt
>             Fix For: 0.9.0.0
>
>
> I've seen `SimpleAclAuthorizerTest ` fail a couple of times since its recent 
> introduction. Here's one such build:
> https://builds.apache.org/job/kafka-trunk-git-pr/576/console
> [~parth.brahmbhatt], can you please take a look and see if it's an easy fix?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to