[
https://issues.apache.org/jira/browse/CALCITE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053864#comment-17053864
]
Liya Fan edited comment on CALCITE-3827 at 3/27/20, 4:21 AM:
-------------------------------------------------------------
[~julianhyde] Thanks a lot for the good suggestion.
We added additional benchmarks and the results show that the old method
outperforms the new one.
In the discussion below, we use "old method" to refer to the previous approach
of traversing the whole graph, and "new method" to refer to our new algorithm
of traversing only related vertices.
We provide two benchmarks: one removes 10% of the vertices
(removeAllVerticesMinorityBenchmark), and the other removes 90%
(removeAllVerticesMajorityBenchmark). The results are as follows:
old approach:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark avgt 5
38.403 ± 0.286 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark avgt 5
11.113 ± 0.054 us/op
new approach:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark avgt 5
4.782 ± 0.017 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark avgt 5
25.888 ± 0.088 us/op
So when removing 10% vertices, the new approach has a 8x speedup, and when
removing 90% vertices, the new approach has a 2.3x slow-down (I think this is
as you have expected). So we add a swtich to the removeAllVertices method (as
you have suggested), to swtich algorithms based on the number of vertices to
remove, with the following results:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark avgt 5
4.782 ± 0.017 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark avgt 5
11.113 ± 0.054 us/op
We also compare the performance with the code before our change (when the
in-edge table was not present), the results are as follows:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark avgt 5
18.553 ± 0.081 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark avgt 5
4.783 ± 0.031 us/op
So compared with the code before change, our change results in a 3.9x speedup
when removing 10% vertices, and 2.2x slow down when removing 90% vertices.
So overall, our change leads to positive performance change for the
{{removeAllVertices}} operation.
(Please note that the main purpose of this issue is to improve the performance
of {{getInwardEdges}} operation, and our change brings orders of magnitude
improvement for this operation. For other operations, the performance remains
almost the same.)
was (Author: fan_li_ya):
[~julianhyde] Thanks a lot for the good suggestion.
We added additional benchmarks and the results show that the old method
outperforms the new one.
In the discussion below, we use "old method" to refer to the previous approach
of traversing the whole graph, and "new method" to refer to our new algorithm
of traversing only related vertices.
We provide two benchmarks: one removes 10% of the vertices
(removeAllVerticesMinorityBenchmark), and the other removes 90%
(removeAllVerticesMajorityBenchmark). The results are as follows:
old approach:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark avgt 5
38.403 ± 0.286 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark avgt 5
11.113 ± 0.054 us/op
new approach:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark avgt 5
4.782 ± 0.017 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark avgt 5
25.888 ± 0.088 us/op
So when removing 10% vertices, the new approach has a 8x speedup, and when
removing 90% vertices, the new approach has a 2.3x slow-down (I think this is
as you have expected). So we add a swtich to the removeAllVertices method (as
you have suggested), to swtich algorithms based on the number of vertices to
remove, with the following results:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark avgt 5
4.782 ± 0.017 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark avgt 5
11.113 ± 0.054 us/op
We also compare the performance with the code before our change (when the
in-edge table was not present), the results are as follows:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark avgt 5
18.553 ± 0.081 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark avgt 5
4.783 ± 0.031 us/op
So compared with the code before change, our change results in a 3.9x speedup
when removing 10% vertices, and 2.2x slow down when removing 90% vertices.
So overall, our change leads to positive performance change for the
{{removeAllVertices}} operation.
> Reduce the time complexity of finding in-edges of a vertex in the graph
> -----------------------------------------------------------------------
>
> Key: CALCITE-3827
> URL: https://issues.apache.org/jira/browse/CALCITE-3827
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Liya Fan
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> In the current graph implementation, it takes O(1) time to find the out-edges
> of a vertex, but O(e) time (where e is the total number of edges in the
> graph) to find the in-edges of a vertex.
> In some scenarios (e.g. clearing cache in hep planner), this may have severe
> performance penalty. To solve this problem, we add a inward edge table, in
> addition to the existing outward edge table, to the graph. This helps to
> reduce the time complexity for finding in-edges to O(1).
> Please note that, there is extra performance overhead for maintaining the in
> edge table, when adding/deleting vertices to the graph. However, the added
> overhead only takes O(1) time.
> Finally, it should be noted that, some existing operations can benefit from
> this improvement (e.g. topological sort).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)