I have been working on implementing a Trie structure to store ACLs and improve the performance in the metadata/authorization code. The upshot of this was that I found it very difficult to determine if the implementation was correctly reimplementing the current implementation.
My goal was to simply change the StandardAuthorizerData implementation and leave the rest of the code as is. However, there were no abstracted tests that would allow me to test this. KAFKA-17316 addresses this issue by creating some internal interfaces for the "metadata/authorizer" package. The one change to the StandardAuthorizer was to implement the"authorizeByResourceType" defined in the "org.apache.kafka.server.authorizer.Authorizer" interface by passing the request down to the AuthorizerData implementation. This change allowed me to create three test implementations. One that implemented "authorizeByResourceType" as it is in the released code base, one that verified that the StandardAuthorizerData implementation did not change the expected results, and one that showed the Trie implementation in KAFKA-17423 was also correct. I think that retaining the work in KAFKA-17316 makes sense as when the next faster implementation comes along we can drop in the replacement and verify that it works correctly. KAFKA-17423 builds on KAFKA-17316 by implementing a Trie based AuthorizerData implementation. By splitting the data into a Trie format the search for matching ACLs is improved by an order of magnitude. The trie implementation allows us to quickly locate the candidate ACLs by splitting them into groups based upon the similarity of the resource name. In addition since we are moving through the trie based on resource name we have several advantages: 1. If we encounter a matching DENY while descending the Trie we can stop as it overrides anything that may be found at lower levels. 2. We only look for LITERAL matches on the descent. If we reach a matching resource name or a leaf node we know there are no LITERAL matches. 3. If we don't have a DENY or a LITERAL match we walk back up the path checking the nodes from the descent looking for a PREFIX match. The advantage here is that we don't have to search again but can simply retrace the path using the Trie structure. I believe that #1 and #2 above account for a significant portion of the speed increase as we do not have to reposition within a sorted list of all ACLs using a binary search. Finally, I think that we should prohibit the use of the java.util.stream classes within the authorizer due to hot path speed considerations. The only existing code that uses streams within that package were test cases. We can prohibit the use by a simple checkstyle prohibition. Doing so will short circuit any misguided potential changes. Thoughs? Claude