I have been working on implementing a Trie structure to store ACLs and
improve the performance in the metadata/authorization code.  The upshot of
this was that I found it very difficult to determine if the implementation
was correctly reimplementing the current implementation.

My goal was to simply change the StandardAuthorizerData implementation and
leave the rest of the code as is.  However, there were no abstracted tests
that would allow me to test this.

KAFKA-17316 addresses this issue by creating some internal interfaces for
the "metadata/authorizer" package.  The one change to the
StandardAuthorizer was to implement the"authorizeByResourceType" defined in
the "org.apache.kafka.server.authorizer.Authorizer" interface by passing
the request down to the AuthorizerData implementation.

This change allowed me to create three test implementations.  One that
implemented "authorizeByResourceType" as it is in the released code base,
one that verified that the StandardAuthorizerData implementation did not
change the expected results, and one that showed the Trie implementation in
KAFKA-17423 was also correct.

I think that retaining the work in KAFKA-17316 makes sense as when the next
faster implementation comes along we can drop in the replacement and verify
that it works correctly.

KAFKA-17423 builds on KAFKA-17316 by implementing a Trie based
AuthorizerData implementation.  By splitting the data into a Trie format
the search for matching ACLs is improved by an order of magnitude.  The
trie implementation allows us to quickly locate the candidate ACLs by
splitting them into groups based upon the similarity of the resource name.
In addition since we are moving through the trie based on resource name we
have several advantages:

   1. If we encounter a matching DENY while descending the Trie we can stop
   as it overrides anything that may be found at lower levels.
   2. We only look for LITERAL matches on the descent.  If we reach a
   matching resource name or a leaf node we know there are no LITERAL matches.
   3. If we don't have a DENY or a LITERAL match we walk back up the path
   checking the nodes from the descent looking for a PREFIX match.  The
   advantage here is that we don't have to search again but can simply retrace
   the path using the Trie structure.

I believe that #1 and #2 above account for a significant portion of the
speed increase as we do not have to reposition within a sorted list of all
ACLs using a binary search.

Finally, I think that we should prohibit the use of the java.util.stream
classes within the authorizer due to hot path speed considerations.  The
only existing code that uses streams within that package were test cases.
We can prohibit the use by a simple checkstyle prohibition.  Doing so will
short circuit any misguided potential changes.

Thoughs?
Claude

Reply via email to