[ 
https://issues.apache.org/jira/browse/CALCITE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049849#comment-17049849
 ] 

Julian Hyde edited comment on CALCITE-3827 at 4/8/20, 7:00 PM:
---------------------------------------------------------------

[~julianhyde] Thanks a lot for your effort for investigating the related code. 

To evaluate the effects to all operations of the graph, we have added more 
benchmarks to evaluate the performance of adding/removing/getting 
edges/vertices in the graph. The results are as follows:

before
{noformat}
Benchmark                                                 Mode  Cnt   Score    
Error  Units
DefaultDirectedGraphBenchmark.addEdgeBenchmark            avgt    5   0.110 ±  
0.002  us/op
DefaultDirectedGraphBenchmark.addVertexBenchmark          avgt    5   0.071 ±  
0.002  us/op
DefaultDirectedGraphBenchmark.getEdgeBenchmark            avgt    5   0.097 ±  
0.001  us/op
DefaultDirectedGraphBenchmark.getInwardEdgesBenchmark     avgt    5  26.604 ±  
0.047  us/op
DefaultDirectedGraphBenchmark.getOutwardEdgesBenchmark    avgt    5   0.204 ±  
0.002  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesBenchmark  avgt    5  12.608 ±  
0.062  us/op
DefaultDirectedGraphBenchmark.removeEdgeBenchmark         avgt    5   0.149 ±  
0.006  us/op
{noformat}

after
{noformat}
Benchmark                                                 Mode  Cnt   Score   
Error  Units
DefaultDirectedGraphBenchmark.addEdgeBenchmark            avgt    5   0.113 ± 
0.002  us/op
DefaultDirectedGraphBenchmark.addVertexBenchmark          avgt    5   0.067 ± 
0.012  us/op
DefaultDirectedGraphBenchmark.getEdgeBenchmark            avgt    5   0.109 ± 
0.002  us/op
DefaultDirectedGraphBenchmark.getInwardEdgesBenchmark     avgt    5   0.203 ± 
0.014  us/op
DefaultDirectedGraphBenchmark.getOutwardEdgesBenchmark    avgt    5   0.212 ± 
0.026  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesBenchmark  avgt    5   0.778 ± 
0.017  us/op
DefaultDirectedGraphBenchmark.removeEdgeBenchmark         avgt    5   0.160 ± 
0.003  us/op
{noformat}

It can be observed that the performance of getting inward edges improves by 
orders of magnitude, while the overheads for other operations are marginal. 

One more thing to note is that removeAllVertices is another operation that 
limits the scalability of the graph. This is because, our current 
implementation involves traversing the whole graph, even if only a small number 
of vertices needs to be removed.

We improve the removeAllVertices operation by traversing only vertices related 
to vertices that will be removed. This leads to orders of magnitude performance 
improvement. Please note that this improvement is only possible when both the 
in-ward and out-ward edge tables are available. 


was (Author: fan_li_ya):
[~julianhyde] Thanks a lot for your effort for investigating the related code. 

To evaluate the effects to all operations of the graph, we have added more 
benchmarks to evaluate the performance of adding/removing/getting 
edges/vertices in the graph. The results are as follows:

before
Benchmark                                                 Mode  Cnt   Score    
Error  Units
DefaultDirectedGraphBenchmark.addEdgeBenchmark            avgt    5   0.110 ±  
0.002  us/op
DefaultDirectedGraphBenchmark.addVertexBenchmark          avgt    5   0.071 ±  
0.002  us/op
DefaultDirectedGraphBenchmark.getEdgeBenchmark            avgt    5   0.097 ±  
0.001  us/op
DefaultDirectedGraphBenchmark.getInwardEdgesBenchmark     avgt    5  26.604 ±  
0.047  us/op
DefaultDirectedGraphBenchmark.getOutwardEdgesBenchmark    avgt    5   0.204 ±  
0.002  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesBenchmark  avgt    5  12.608 ±  
0.062  us/op
DefaultDirectedGraphBenchmark.removeEdgeBenchmark         avgt    5   0.149 ±  
0.006  us/op

after
Benchmark                                                 Mode  Cnt   Score   
Error  Units
DefaultDirectedGraphBenchmark.addEdgeBenchmark            avgt    5   0.113 ± 
0.002  us/op
DefaultDirectedGraphBenchmark.addVertexBenchmark          avgt    5   0.067 ± 
0.012  us/op
DefaultDirectedGraphBenchmark.getEdgeBenchmark            avgt    5   0.109 ± 
0.002  us/op
DefaultDirectedGraphBenchmark.getInwardEdgesBenchmark     avgt    5   0.203 ± 
0.014  us/op
DefaultDirectedGraphBenchmark.getOutwardEdgesBenchmark    avgt    5   0.212 ± 
0.026  us/op
DefaultDirectedGraphBenchmark.removeAllVerticesBenchmark  avgt    5   0.778 ± 
0.017  us/op
DefaultDirectedGraphBenchmark.removeEdgeBenchmark         avgt    5   0.160 ± 
0.003  us/op

It can be observed that the performance of getting inward edges improves by 
orders of magnitude, while the overheads for other operations are marginal. 

One more thing to note is that removeAllVertices is another operation that 
limits the scalability of the graph. This is because, our current 
implementation involves traversing the whole graph, even if only a small number 
of vertices needs to be removed.

We improve the removeAllVertices operation by traversing only vertices related 
to vertices that will be removed. This leads to orders of magnitude performance 
improvement. Please note that this improvement is only possible when both the 
in-ward and out-ward edge tables are available. 

> Reduce the time complexity of finding in-edges of a vertex in the graph
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-3827
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3827
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Liya Fan
>            Assignee: Julian Hyde
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In the current graph implementation, it takes O(1) time to find the out-edges 
> of a vertex, but O(e) time (where e is the total number of edges in the 
> graph) to find the in-edges of a vertex. 
> In some scenarios (e.g. clearing cache in hep planner), this may have severe 
> performance penalty.  To solve this problem, we add a inward edge table, in 
> addition to the existing outward edge table, to the graph. This helps to 
> reduce the time complexity for finding in-edges to O(1). 
> Please note that, there is extra performance overhead for maintaining the in 
> edge table, when adding/deleting vertices to the graph. However, the added 
> overhead only takes O(1) time.
> Finally, it should be noted that, some existing operations can benefit from 
> this improvement (e.g. topological sort). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to