Re: [PR] [FLINK-36192][autocaler] Autocaler supports adjusting the parallelism of source vertex based on the number of partitions in Kafka or pulsars [flink-kubernetes-operator]

via GitHub Wed, 11 Sep 2024 14:15:58 -0700


mxm commented on code in PR #879:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/879#discussion_r1754969067



##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java:
##########
@@ -378,28 +405,70 @@ protected static int scale(
 
         // Cap parallelism at either maxParallelism(number of key groups or 
source partitions) or
         // parallelism upper limit
-        final int upperBound = Math.min(maxParallelism, parallelismUpperLimit);
+        int upperBound = Math.min(maxParallelism, parallelismUpperLimit);
 
         // Apply min/max parallelism
         newParallelism = Math.min(Math.max(parallelismLowerLimit, 
newParallelism), upperBound);
 
         var adjustByMaxParallelism =
                 inputShipStrategies.isEmpty() || 
inputShipStrategies.contains(HASH);
         if (!adjustByMaxParallelism) {
-            return newParallelism;
+            return Tuple2.of(newParallelism, Optional.empty());
         }
 
-        // When the shuffle type of vertex inputs contains keyBy or vertex is 
a source, we try to
-        // adjust the parallelism such that it divides the maxParallelism 
without a remainder
-        // => data is evenly spread across subtasks
-        for (int p = newParallelism; p <= maxParallelism / 2 && p <= 
upperBound; p++) {
-            if (maxParallelism % p == 0) {
-                return p;
+        if (numPartitions <= 0) {
+            // When the shuffle type of vertex inputs contains keyBy or vertex 
is a source,
+            // we try to adjust the parallelism such that it divides the 
maxParallelism without a
+            // remainder => data is evenly spread across subtasks
+            for (int p = newParallelism; p <= maxParallelism / 2 && p <= 
upperBound; p++) {
+                if (maxParallelism % p == 0) {
+                    return Tuple2.of(p, Optional.empty());
+                }
+            }
+            // If parallelism adjustment fails, use originally computed 
parallelism
+            return Tuple2.of(newParallelism, Optional.empty());
+        } else {
+
+            // When we know the numPartitions at a vertex,
+            // adjust the parallelism such that it divides the numPartitions 
without a remainder
+            // => Data is evenly distributed among subtasks
+            for (int p = newParallelism; p <= upperBound && p <= 
numPartitions; p++) {
+                if (numPartitions % p == 0) {
+                    return Tuple2.of(p, Optional.empty());
+                }
             }
-        }
 
-        // If parallelism adjustment fails, use originally computed parallelism
-        return newParallelism;
+            // When the degree of parallelism after rounding up cannot be 
evenly divided by source
+            // PartitionCount, Try to find the smallest parallelism that can 
satisfy the current
+            // consumption rate.
+            for (int p = newParallelism; p > parallelismLowerLimit; p--) {
+                if (numPartitions / p > numPartitions / newParallelism) {
+                    if (numPartitions % p != 0) {
+                        p += 1;
+                    }

Review Comment:
   Thanks for explaining in detail. I misread some of the code. It is correct 
that we need to add +1 when we have found a parallelism which yields a greater 
value for `num_partitions / p` than the initial `num_partitions / 
new_parallelism` because we have found the tipping point where we achieve the 
most utilization in terms of partitions per task.
   
   I think we should return `new_parallelism` if all adaptation logic fails 
because using a potentially very small configured lower parallelism could make 
things a lot worse due to resource constraints.



##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java:
##########
@@ -191,16 +200,29 @@ public ParallelismChange computeScaleTargetParallelism(
         double cappedTargetCapacity = averageTrueProcessingRate * scaleFactor;
         LOG.debug("Capped target processing capacity for {} is {}", vertex, 
cappedTargetCapacity);
 
-        int newParallelism =
+        Tuple2<Integer, Optional<String>> newParallelism =
                 scale(

Review Comment:
   Fine with me. Alternatively, for tests, we could also pass in the test 
implementation of the event handler which allows to inspect the generated 
events.



##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java:
##########
@@ -378,28 +405,70 @@ protected static int scale(
 
         // Cap parallelism at either maxParallelism(number of key groups or 
source partitions) or
         // parallelism upper limit
-        final int upperBound = Math.min(maxParallelism, parallelismUpperLimit);
+        int upperBound = Math.min(maxParallelism, parallelismUpperLimit);
 
         // Apply min/max parallelism
         newParallelism = Math.min(Math.max(parallelismLowerLimit, 
newParallelism), upperBound);
 
         var adjustByMaxParallelism =
                 inputShipStrategies.isEmpty() || 
inputShipStrategies.contains(HASH);
         if (!adjustByMaxParallelism) {
-            return newParallelism;
+            return Tuple2.of(newParallelism, Optional.empty());
         }
 
-        // When the shuffle type of vertex inputs contains keyBy or vertex is 
a source, we try to
-        // adjust the parallelism such that it divides the maxParallelism 
without a remainder
-        // => data is evenly spread across subtasks
-        for (int p = newParallelism; p <= maxParallelism / 2 && p <= 
upperBound; p++) {
-            if (maxParallelism % p == 0) {
-                return p;
+        if (numPartitions <= 0) {
+            // When the shuffle type of vertex inputs contains keyBy or vertex 
is a source,
+            // we try to adjust the parallelism such that it divides the 
maxParallelism without a
+            // remainder => data is evenly spread across subtasks
+            for (int p = newParallelism; p <= maxParallelism / 2 && p <= 
upperBound; p++) {
+                if (maxParallelism % p == 0) {
+                    return Tuple2.of(p, Optional.empty());
+                }
+            }
+            // If parallelism adjustment fails, use originally computed 
parallelism
+            return Tuple2.of(newParallelism, Optional.empty());
+        } else {
+
+            // When we know the numPartitions at a vertex,
+            // adjust the parallelism such that it divides the numPartitions 
without a remainder
+            // => Data is evenly distributed among subtasks
+            for (int p = newParallelism; p <= upperBound && p <= 
numPartitions; p++) {
+                if (numPartitions % p == 0) {
+                    return Tuple2.of(p, Optional.empty());
+                }
             }

Review Comment:
   Right, I missed that. I was trying to generalize the two code blocks. How 
about the following?
   
   ```suggestion
           if (numPartitions <= 0) {
               upperBound = Math.min(maxParallelism / 2, upperBound);
           } else {
               upperBound = Math.min(num_partitions, upperBound);
               maxParallelism = num_partitions;
           }
          for (int p = newParallelism; p <= upperBound; p++) {
               if (maxParallelism % p == 0) {
                   return Tuple2.of(p, Optional.empty());
               }
           }
          ...
          // Resource optimization logic follows (if we can't achieve optimal 
partitioning)
          // (See review comment below)
          ...
          // If parallelism adjustment fails, use originally computed 
parallelism
          return Tuple2.of(newParallelism, Optional.empty());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-36192][autocaler] Autocaler supports adjusting the parallelism of source vertex based on the number of partitions in Kafka or pulsars [flink-kubernetes-operator]

Reply via email to