[ 
https://issues.apache.org/jira/browse/CALCITE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053864#comment-17053864
 ] 

Liya Fan edited comment on CALCITE-3827 at 3/27/20, 4:21 AM:
-------------------------------------------------------------

[~julianhyde] Thanks a lot for the good suggestion.

We added additional benchmarks and the results show that the old method 
outperforms the new one. 

In the discussion below, we use "old method" to refer to the previous approach 
of traversing the whole graph, and "new method" to refer to our new algorithm 
of traversing only related vertices. 

We provide two benchmarks: one removes 10% of the vertices 
(removeAllVerticesMinorityBenchmark), and the other removes 90% 
(removeAllVerticesMajorityBenchmark). The results are as follows:

old approach:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark   avgt    5  
38.403 ± 0.286  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark   avgt    5  
11.113 ± 0.054  us/op

new approach:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark  avgt    5   
4.782 ± 0.017  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark  avgt    5  
25.888 ± 0.088  us/op

So when removing 10% vertices, the new approach has a 8x speedup, and when 
removing 90% vertices, the new approach has a 2.3x slow-down (I think this is 
as you have expected). So we add a swtich to the removeAllVertices method (as 
you have suggested), to swtich algorithms based on the number of vertices to 
remove, with the following results:

DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark  avgt    5   
4.782 ± 0.017  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark   avgt    5  
11.113 ± 0.054  us/op

We also compare the performance with the code before our change (when the 
in-edge table was not present), the results are as follows:

DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark  avgt    5  
18.553 ± 0.081  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark  avgt    5   
4.783 ± 0.031  us/op

So compared with the code before change, our change results in a 3.9x speedup 
when removing 10% vertices, and 2.2x slow down when removing 90% vertices. 

So overall, our change leads to positive performance change for the 
{{removeAllVertices}} operation.

(Please note that the main purpose of this issue is to improve the performance 
of {{getInwardEdges}} operation, and our change brings orders of magnitude 
improvement for this operation. For other operations, the performance remains 
almost the same.)


was (Author: fan_li_ya):
[~julianhyde] Thanks a lot for the good suggestion.

We added additional benchmarks and the results show that the old method 
outperforms the new one. 

In the discussion below, we use "old method" to refer to the previous approach 
of traversing the whole graph, and "new method" to refer to our new algorithm 
of traversing only related vertices. 

We provide two benchmarks: one removes 10% of the vertices 
(removeAllVerticesMinorityBenchmark), and the other removes 90% 
(removeAllVerticesMajorityBenchmark). The results are as follows:

old approach:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark   avgt    5  
38.403 ± 0.286  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark   avgt    5  
11.113 ± 0.054  us/op

new approach:
DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark  avgt    5   
4.782 ± 0.017  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark  avgt    5  
25.888 ± 0.088  us/op

So when removing 10% vertices, the new approach has a 8x speedup, and when 
removing 90% vertices, the new approach has a 2.3x slow-down (I think this is 
as you have expected). So we add a swtich to the removeAllVertices method (as 
you have suggested), to swtich algorithms based on the number of vertices to 
remove, with the following results:

DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark  avgt    5   
4.782 ± 0.017  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark   avgt    5  
11.113 ± 0.054  us/op

We also compare the performance with the code before our change (when the 
in-edge table was not present), the results are as follows:

DefaultDirectedGraphBenchmark.removeAllVerticesMinorityBenchmark  avgt    5  
18.553 ± 0.081  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesMajorityBenchmark  avgt    5   
4.783 ± 0.031  us/op

So compared with the code before change, our change results in a 3.9x speedup 
when removing 10% vertices, and 2.2x slow down when removing 90% vertices. 

So overall, our change leads to positive performance change for the 
{{removeAllVertices}} operation.

> Reduce the time complexity of finding in-edges of a vertex in the graph
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-3827
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3827
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In the current graph implementation, it takes O(1) time to find the out-edges 
> of a vertex, but O(e) time (where e is the total number of edges in the 
> graph) to find the in-edges of a vertex. 
> In some scenarios (e.g. clearing cache in hep planner), this may have severe 
> performance penalty.  To solve this problem, we add a inward edge table, in 
> addition to the existing outward edge table, to the graph. This helps to 
> reduce the time complexity for finding in-edges to O(1). 
> Please note that, there is extra performance overhead for maintaining the in 
> edge table, when adding/deleting vertices to the graph. However, the added 
> overhead only takes O(1) time.
> Finally, it should be noted that, some existing operations can benefit from 
> this improvement (e.g. topological sort). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to