zhuzhurk commented on a change in pull request #13422:
URL: https://github.com/apache/flink/pull/13422#discussion_r491878476
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/PipelinedRegionSchedulingStrategy.java
##########
@@ -127,13 +136,9 @@ private void maybeScheduleRegion(final
SchedulingPipelinedRegion region) {
checkState(areRegionVerticesAllInCreatedState(region), "BUG:
trying to schedule a region which is not in CREATED state");
- final Set<ExecutionVertexID> verticesToSchedule =
IterableUtils.toStream(region.getVertices())
- .map(SchedulingExecutionVertex::getId)
- .collect(Collectors.toSet());
final List<ExecutionVertexDeploymentOption>
vertexDeploymentOptions =
-
SchedulingStrategyUtils.createExecutionVertexDeploymentOptionsInTopologicalOrder(
- schedulingTopology,
- verticesToSchedule,
+
SchedulingStrategyUtils.createExecutionVertexDeploymentOptions(
Review comment:
I think it's not needed. Because the sorting is not a problem for other
strategies where the sorting is invoked once per triggering event. While for
pipelined region scheduling, it is invoked multiple times, each for one region.
And doing it like this requires to add a sorted region vertex map for other
strategies.
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/SchedulingStrategyUtils.java
##########
@@ -81,6 +82,14 @@
final SchedulingTopology topology,
final Set<SchedulingPipelinedRegion> regions) {
+ // Avoid the O(V) (V is the number of vertices in the topology)
sorting
+ // complexity if the given set of regions is small enough
+ if (regions.size() == 0) {
+ return Collections.emptyList();
+ } else if (regions.size() == 1) {
+ return
Collections.singletonList(regions.iterator().next());
+ }
+
return IterableUtils.toStream(topology.getVertices())
Review comment:
I think we can keep it as is because it is not the critical part.
The reworked java streams are in per-region code path, while this one is
invoked the per-event code path. But even the reworked loops are not most
critical, because it is in a O(V) complexity.
The other 2 commits to fix the vertex/region sorting, reduces O(V^2)
complexity to O(V) in each valid triggering event.
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/SchedulingStrategyUtils.java
##########
@@ -81,6 +82,14 @@
final SchedulingTopology topology,
final Set<SchedulingPipelinedRegion> regions) {
+ // Avoid the O(V) (V is the number of vertices in the topology)
sorting
+ // complexity if the given set of regions is small enough
+ if (regions.size() == 0) {
+ return Collections.emptyList();
+ } else if (regions.size() == 1) {
+ return
Collections.singletonList(regions.iterator().next());
+ }
+
return IterableUtils.toStream(topology.getVertices())
Review comment:
I think we can keep it as is to be simpler because it is not the
critical part.
The reworked java streams are in per-region code path, while this one is
invoked the per-event code path. But even the reworked loops are not most
critical, because it is in a O(V) complexity.
The other 2 commits to fix the vertex/region sorting, reduces O(V^2)
complexity to O(V) in each valid triggering event.
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/PipelinedRegionSchedulingStrategy.java
##########
@@ -127,13 +136,9 @@ private void maybeScheduleRegion(final
SchedulingPipelinedRegion region) {
checkState(areRegionVerticesAllInCreatedState(region), "BUG:
trying to schedule a region which is not in CREATED state");
- final Set<ExecutionVertexID> verticesToSchedule =
IterableUtils.toStream(region.getVertices())
- .map(SchedulingExecutionVertex::getId)
- .collect(Collectors.toSet());
final List<ExecutionVertexDeploymentOption>
vertexDeploymentOptions =
-
SchedulingStrategyUtils.createExecutionVertexDeploymentOptionsInTopologicalOrder(
- schedulingTopology,
- verticesToSchedule,
+
SchedulingStrategyUtils.createExecutionVertexDeploymentOptions(
Review comment:
I think it's not needed. Because the sorting is not a problem for other
strategies where the sorting is invoked once per triggering event. While for
pipelined region scheduling, it is invoked multiple times, each for one region.
And doing it like this requires to add a sorted region vertex map for other
strategies, which would be an unnecessary complication.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]