Re: [PR] [FLINK-34504][autoscaler] Avoid the parallelism adjustment when the upstream shuffle type doesn't have keyBy [flink-kubernetes-operator]

via GitHub Mon, 26 Feb 2024 03:05:27 -0800


mxm commented on code in PR #783:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/783#discussion_r1502427965



##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java:
##########
@@ -248,25 +250,22 @@ private boolean detectIneffectiveScaleUp(
     }
 
     /**
-     * Compute newParallelism according to currentParallelism.
+     * Computing the newParallelism. In general, newParallelism = 
currentParallelism * scaleFactor.
+     * But we limit newParallelism between parallelismLowerLimit and 
min(parallelismUpperLimit,
+     * maxParallelism).
      *
-     * @param currentParallelism The current parallelism.
-     * @param maxParallelism The max parallelism for job vertices. It's 
numKeyGroups by default, and
-     *     it's partition number for kafka source vertex.
-     * @param scaleFactor The scale factor.
-     * @param parallelismLowerLimit The parallelism lower limitation in 
autoscaler option.
-     * @param parallelismUpperLimit The parallelism upper limitation in 
autoscaler option.
-     * @param adjustByMaxParallelism True means we need to adjust parallelism 
according to the
-     *     maxParallelism to ensure the keyGroup or partition evenly.
+     * <p>Also, in order to ensure the data is evenly spread across subtasks, 
we try to adjust the
+     * parallelism for source and keyed vertex such that it divides the 
maxParallelism without a
+     * remainder.
      */
     @VisibleForTesting
     protected static int scale(
             int currentParallelism,
+            Map<JobVertexID, ShipStrategy> inputs,

Review Comment:
   This could be reduced to just `Collection<ShipStrategy>`, we don't need the 
vertex.



##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java:
##########
@@ -248,25 +250,22 @@ private boolean detectIneffectiveScaleUp(
     }
 
     /**
-     * Compute newParallelism according to currentParallelism.
+     * Computing the newParallelism. In general, newParallelism = 
currentParallelism * scaleFactor.
+     * But we limit newParallelism between parallelismLowerLimit and 
min(parallelismUpperLimit,
+     * maxParallelism).
      *
-     * @param currentParallelism The current parallelism.
-     * @param maxParallelism The max parallelism for job vertices. It's 
numKeyGroups by default, and
-     *     it's partition number for kafka source vertex.
-     * @param scaleFactor The scale factor.
-     * @param parallelismLowerLimit The parallelism lower limitation in 
autoscaler option.
-     * @param parallelismUpperLimit The parallelism upper limitation in 
autoscaler option.
-     * @param adjustByMaxParallelism True means we need to adjust parallelism 
according to the
-     *     maxParallelism to ensure the keyGroup or partition evenly.
+     * <p>Also, in order to ensure the data is evenly spread across subtasks, 
we try to adjust the
+     * parallelism for source and keyed vertex such that it divides the 
maxParallelism without a
+     * remainder.
      */
     @VisibleForTesting
     protected static int scale(
             int currentParallelism,
+            Map<JobVertexID, ShipStrategy> inputs,

Review Comment:
   This could be reduced to just `Collection<ShipStrategy>`, we don't need the 
vertex id.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-34504][autoscaler] Avoid the parallelism adjustment when the upstream shuffle type doesn't have keyBy [flink-kubernetes-operator]

Reply via email to