[ 
https://issues.apache.org/jira/browse/KAFKA-13889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539039#comment-17539039
 ] 

Andrew Grant commented on KAFKA-13889:
--------------------------------------

[~jagsancio] With the solution [~cmccabe] suggested above, and the code I've 
implemented, AclControlManager doesnt need updating. Originally I suggested the 
code that calls removeAcl should catch the exception and continue on. If we had 
gone with this approach we'd need to add that logic in AclControlManager and 
BrokerMetadataPublisher because both places call removeAcl and I figured we 
should be consistent in our error handling. But as [~cmccabe] mentioned above 
this could hide bugs. So in the PR I've confined the changes to AclsDelta such 
that we wont return the delete event to BrokerMetadataPublisher and so we wont 
call removeAcl for an ACL that never got added. So with this approach neither 
AclControlManager nor BrokerMetadataPublisher need updating.

Having said all this, per 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum]
 "Note it is possible that a request could time out before the leader has 
successfully committed the records, and the client or the broker itself would 
retry, which would result in duplicated updates to the quorum. Since in Kafka's 
usage, all updates are overwrites which are idempotent (as the nature of 
configuration is a key-value mapping). Therefore, we do not need to implement 
serial number or request caching to achieve "exactly-once"." I'm wondering if 
we do need to handle duplicates which could result in removeAcl being called 
twice. [~hachikuji] [~cmccabe] Thoughts? 

FWIW I think the PR is still valid because we can still keep the logic in the 
PR regardless and it's strictly an improvement over what we have today. If we 
want to handle duplicate REMOVE_ACCESS_CONTROL_ENTRY_RECORD records I think it 
can be done in a separate, larger PR after more discussion (I can create a Jira 
if so).

> Fix AclsDelta to handle ACCESS_CONTROL_ENTRY_RECORD quickly followed by 
> REMOVE_ACCESS_CONTROL_ENTRY_RECORD for same ACL
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13889
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13889
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Andrew Grant
>            Priority: Major
>
> In 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/image/AclsDelta.java#L64]
>  we store the pending deletion in the changes map. This could override a 
> creation that might have just happened. This is an issue because in 
> BrokerMetadataPublisher this results in us making a removeAcl call which 
> finally results in 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L203]
>  being executed and this code throws an exception if the ACL isnt in the Map 
> yet. If the ACCESS_CONTROL_ENTRY_RECORD event never got processed by 
> BrokerMetadataPublisher then the ACL wont be in the Map yet.
> My feeling is we might want to make removeAcl idempotent in that it returns 
> success if the ACL doesn't exist: no matter how many times removeAcl is 
> called it returns success if the ACL is deleted. Maybe we’d just log a 
> warning or something?
> Note, I dont think the AclControlManager has this issue because it doesn't 
> batch the events like AclsDelta does. However, we still do throw a 
> RuntimeException here 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/AclControlManager.java#L197]
>  - maybe we should still follow the same logic (if we make the fix suggested 
> above) and just log a warning if the ACL doesnt exist in the Map?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to