[
https://issues.apache.org/jira/browse/FLINK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048840#comment-17048840
]
Zhu Zhu commented on FLINK-15249:
---------------------------------
[~nppoly] sorry for the late response. Just checked the PR and run the test
again.
Looks to me that this change is targeting to to improve the region building
performance for a specific topology are rare in production cases. However, the
performance for the most common topologies are becoming worse (I tested a
4000x4000 ALL-to-ALL pipelined connected topology, the performance with the new
change is much slower, to be specific 1570ms v.s. 929ms).
I think we should not make regression to the common cases to improve a corner
case. So I would say not to make this change.
Need to mention that the set merging cost should not be the critical part for
region building if there are All-to-All connections. Since the edge iteration
complexity would be much larger (V^2 compared to V). If there is not
All-to-All connection, the region building time cost is usually low and not a
problem.
> Improve PipelinedRegions calculation with Union Set
> ---------------------------------------------------
>
> Key: FLINK-15249
> URL: https://issues.apache.org/jira/browse/FLINK-15249
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: Chongchen Chen
> Priority: Major
> Labels: pull-request-available
> Attachments: PipelinedRegionComputeUtil.diff,
> RegionFailoverPerfTest.java, new.diff
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Union Set's Merge Set cost is O(1). current implementation is O(N). the
> attachment is patch.
> [Disjoint Set Data
> Structure|[https://en.wikipedia.org/wiki/Disjoint-set_data_structure]]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)