ableegoldman commented on a change in pull request #10509:
URL: https://github.com/apache/kafka/pull/10509#discussion_r622472215



##########
File path: 
clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractStickyAssignor.java
##########
@@ -263,16 +279,59 @@ private boolean allSubscriptionsEqual(Set<String> 
allTopics,
         if (log.isDebugEnabled()) {
             log.debug("final assignment: " + assignment);
         }
-
+        
         return assignment;
     }
 
-    private SortedSet<TopicPartition> getTopicPartitions(Map<String, Integer> 
partitionsPerTopic) {
-        SortedSet<TopicPartition> allPartitions =
-            new 
TreeSet<>(Comparator.comparing(TopicPartition::topic).thenComparing(TopicPartition::partition));
-        for (Entry<String, Integer> entry: partitionsPerTopic.entrySet()) {
-            String topic = entry.getKey();
-            for (int i = 0; i < entry.getValue(); ++i) {
+    /**
+     * get the unassigned partition list by computing the difference set of 
the sortedPartitions(all partitions)
+     * and sortedToBeRemovedPartitions. We use two pointers technique here:
+     *
+     * We loop the sortedPartition, and compare the ith element in sorted 
toBeRemovedPartitions(i start from 0):
+     *   - if not equal to the ith element, add to unassignedPartitions
+     *   - if equal to the the ith element, get next element from 
sortedToBeRemovedPartitions
+     *
+     * @param sortedPartitions: sorted all partitions
+     * @param sortedToBeRemovedPartitions: sorted partitions, all are included 
in the sortedPartitions
+     * @return the partitions don't assign to any current consumers
+     */
+    private List<TopicPartition> getUnassignedPartitions(List<TopicPartition> 
sortedPartitions,

Review comment:
       Thanks for getting some concrete numbers to work with! I suspected the 
theory would not match the reality due to caching primarily, although I wasn't 
aware of the improved runtime of sort on a partially-ordered list. That's good 
to know 😄  And it does make sense in hindsight given the nature of the sorting 
algorithm. 
   
   I've always found that the reality of array performance with any reasonable 
caching architecture, compared to theoretically better data 
structures/algorithms is one of those things that people know and still 
subconsciously doubt. Probably because most people spent 4 years in college 
getting theoretical algorithmic runtimes drilled into their heads, and far less 
time looking into the underlying architecture that powers those algorithms and 
applies its own optimizations under the hood. It's an interesting psychological 
observation. There's a great talk on it somewhere but I can't remember the name
   
   Anyways, you just never know until you run the numbers. I'm sure this may 
vary somewhat with different input parameters but I think I'm convinced, let's 
stick with this improvement. If someone starts complaining about the memory 
consumption we can always go back and look for ways to cut down. Thanks for the 
enlightening discussion 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to