[GitHub] [flink] zhuzhurk commented on a change in pull request #13422: [FLINK-19286][runtime] Improve pipelined region scheduling performance

GitBox Mon, 21 Sep 2020 20:48:47 -0700


zhuzhurk commented on a change in pull request #13422:
URL: https://github.com/apache/flink/pull/13422#discussion_r491878476




##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/PipelinedRegionSchedulingStrategy.java
##########
@@ -127,13 +136,9 @@ private void maybeScheduleRegion(final 
SchedulingPipelinedRegion region) {
 
                checkState(areRegionVerticesAllInCreatedState(region), "BUG: 
trying to schedule a region which is not in CREATED state");
 
-               final Set<ExecutionVertexID> verticesToSchedule = 
IterableUtils.toStream(region.getVertices())
-                       .map(SchedulingExecutionVertex::getId)
-                       .collect(Collectors.toSet());
                final List<ExecutionVertexDeploymentOption> 
vertexDeploymentOptions =
-                       
SchedulingStrategyUtils.createExecutionVertexDeploymentOptionsInTopologicalOrder(
-                               schedulingTopology,
-                               verticesToSchedule,
+                       
SchedulingStrategyUtils.createExecutionVertexDeploymentOptions(

Review comment:
       I think it's not needed. Because the sorting is not a problem for other 
strategies where the sorting is invoked once per triggering event. While for 
pipelined region scheduling, it is invoked multiple times, each for one region.
   And doing it like this requires to add a sorted region vertex map for other 
strategies.

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/SchedulingStrategyUtils.java
##########
@@ -81,6 +82,14 @@
                        final SchedulingTopology topology,
                        final Set<SchedulingPipelinedRegion> regions) {
 
+               // Avoid the O(V) (V is the number of vertices in the topology) 
sorting
+               // complexity if the given set of regions is small enough
+               if (regions.size() == 0) {
+                       return Collections.emptyList();
+               } else if (regions.size() == 1) {
+                       return 
Collections.singletonList(regions.iterator().next());
+               }
+
                return IterableUtils.toStream(topology.getVertices())

Review comment:
       I think we can keep it as is because it is not the critical part. 
   The reworked java streams are in per-region code path, while this one is 
invoked the per-event code path. But even the reworked loops are not most 
critical, because it is in a O(V) complexity.
   The other 2 commits to fix the vertex/region sorting, reduces O(V^2) 
complexity to O(V) in each valid triggering event.

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/SchedulingStrategyUtils.java
##########
@@ -81,6 +82,14 @@
                        final SchedulingTopology topology,
                        final Set<SchedulingPipelinedRegion> regions) {
 
+               // Avoid the O(V) (V is the number of vertices in the topology) 
sorting
+               // complexity if the given set of regions is small enough
+               if (regions.size() == 0) {
+                       return Collections.emptyList();
+               } else if (regions.size() == 1) {
+                       return 
Collections.singletonList(regions.iterator().next());
+               }
+
                return IterableUtils.toStream(topology.getVertices())

Review comment:
       I think we can keep it as is to be simpler because it is not the 
critical part. 
   The reworked java streams are in per-region code path, while this one is 
invoked the per-event code path. But even the reworked loops are not most 
critical, because it is in a O(V) complexity.
   The other 2 commits to fix the vertex/region sorting, reduces O(V^2) 
complexity to O(V) in each valid triggering event.

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/PipelinedRegionSchedulingStrategy.java
##########
@@ -127,13 +136,9 @@ private void maybeScheduleRegion(final 
SchedulingPipelinedRegion region) {
 
                checkState(areRegionVerticesAllInCreatedState(region), "BUG: 
trying to schedule a region which is not in CREATED state");
 
-               final Set<ExecutionVertexID> verticesToSchedule = 
IterableUtils.toStream(region.getVertices())
-                       .map(SchedulingExecutionVertex::getId)
-                       .collect(Collectors.toSet());
                final List<ExecutionVertexDeploymentOption> 
vertexDeploymentOptions =
-                       
SchedulingStrategyUtils.createExecutionVertexDeploymentOptionsInTopologicalOrder(
-                               schedulingTopology,
-                               verticesToSchedule,
+                       
SchedulingStrategyUtils.createExecutionVertexDeploymentOptions(

Review comment:
       I think it's not needed. Because the sorting is not a problem for other 
strategies where the sorting is invoked once per triggering event. While for 
pipelined region scheduling, it is invoked multiple times, each for one region.
   And doing it like this requires to add a sorted region vertex map for other 
strategies, which would be an unnecessary complication.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] zhuzhurk commented on a change in pull request #13422: [FLINK-19286][runtime] Improve pipelined region scheduling performance

Reply via email to