[jira] [Comment Edited] (TINKERPOP-2256) processAllStarts of AggregateStep should be called only once

Wang Xiyu (JIRA) Wed, 10 Jul 2019 05:43:35 -0700


    [ 
https://issues.apache.org/jira/browse/TINKERPOP-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881520#comment-16881520
 ]


Wang Xiyu edited comment on TINKERPOP-2256 at 7/10/19 12:42 PM:
----------------------------------------------------------------

I will provide a better benchmark between the version avoiding the hasNext() 
call and the origin version tomorrow.

============

But now there is only a simple benchmark with TinkerGraph:
Machine : MacBook Pro (13-inch, 2019), 2.4 GHz Intel Core i5
{code:java}
gremlin> graph = TinkerFactory.createGratefulDead()
gremlin> g = graph.traversal()
gremlin> g.V(1).out().out().out().out().aggregate('a').path().profile()
==>Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1        
   1         110.440     7.48
VertexStep(OUT,vertex)                                                 7        
   7         132.841     8.99
VertexStep(OUT,vertex)                                               190        
 190         116.895     7.91
VertexStep(OUT,vertex)                                              7846        
7846         155.564    10.53
VertexStep(OUT,vertex)                                            344093      
344093         210.354    14.24
AggregateStep(a)                                                  344093      
343721         643.464    43.55
PathStep                                                          344093      
343721         107.842     7.30
                                            >TOTAL                     -        
   -        1477.402
gremlin> g.V(1).out().out().out().out().path().profile()
==>Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1        
   1           0.052     0.03
VertexStep(OUT,vertex)                                                 7        
   7           0.015     0.01
VertexStep(OUT,vertex)                                               190        
 190           0.072     0.04
VertexStep(OUT,vertex)                                              7846        
7846           1.921     1.02
VertexStep(OUT,vertex)                                            344093      
344093          88.395    46.97
PathStep                                                          344093      
344093          97.757    51.94
                                            >TOTAL                     -        
   -         188.215        -
gremlin> g.V(1).out().out().out().out().store('a').path().profile()
==>Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1        
   1           0.061     0.02
VertexStep(OUT,vertex)                                                 7        
   7           0.017     0.01
VertexStep(OUT,vertex)                                               190        
 190           0.078     0.02
VertexStep(OUT,vertex)                                              7846        
7846           1.947     0.58
VertexStep(OUT,vertex)                                            344093      
344093          78.538    23.45
StoreStep(a)                                                      344093      
344093         155.571    46.44
PathStep                                                          344093      
344093          98.765    29.48
                                            >TOTAL                     -        
   -         334.980        -
{code}
out().out().out().out() is used to create enough Traversers.
 path() is used to disable LazyBarrierStrategy.

This simple benchmark shows that AggregateStep cause huge overhead on the first 
step ( TinkerGraphStep(vertex,[1]) ). After we broken down the 110.440ms, we 
found that most of it was wasted by hasNext.


was (Author: wangxiyu191):
I will provide a better benchmark between the version avoiding the hasNext() 
call and the origin version tomorrow.

============

But now there is only a simple benchmark with TinkerGraph:
{code:java}
gremlin> graph = TinkerFactory.createGratefulDead()
gremlin> g = graph.traversal()
gremlin> g.V(1).out().out().out().out().aggregate('a').path().profile()
==>Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1        
   1         110.440     7.48
VertexStep(OUT,vertex)                                                 7        
   7         132.841     8.99
VertexStep(OUT,vertex)                                               190        
 190         116.895     7.91
VertexStep(OUT,vertex)                                              7846        
7846         155.564    10.53
VertexStep(OUT,vertex)                                            344093      
344093         210.354    14.24
AggregateStep(a)                                                  344093      
343721         643.464    43.55
PathStep                                                          344093      
343721         107.842     7.30
                                            >TOTAL                     -        
   -        1477.402
gremlin> g.V(1).out().out().out().out().path().profile()
==>Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1        
   1           0.052     0.03
VertexStep(OUT,vertex)                                                 7        
   7           0.015     0.01
VertexStep(OUT,vertex)                                               190        
 190           0.072     0.04
VertexStep(OUT,vertex)                                              7846        
7846           1.921     1.02
VertexStep(OUT,vertex)                                            344093      
344093          88.395    46.97
PathStep                                                          344093      
344093          97.757    51.94
                                            >TOTAL                     -        
   -         188.215        -
gremlin> g.V(1).out().out().out().out().store('a').path().profile()
==>Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1        
   1           0.061     0.02
VertexStep(OUT,vertex)                                                 7        
   7           0.017     0.01
VertexStep(OUT,vertex)                                               190        
 190           0.078     0.02
VertexStep(OUT,vertex)                                              7846        
7846           1.947     0.58
VertexStep(OUT,vertex)                                            344093      
344093          78.538    23.45
StoreStep(a)                                                      344093      
344093         155.571    46.44
PathStep                                                          344093      
344093          98.765    29.48
                                            >TOTAL                     -        
   -         334.980        -
{code}
out().out().out().out() is used to create enough Traversers.
 path() is used to disable LazyBarrierStrategy.

This simple benchmark shows that AggregateStep cause huge overhead on the first 
step ( TinkerGraphStep(vertex,[1]) ). After we broken down the 110.440ms, we 
found that most of it was wasted by hasNext.

> processAllStarts of AggregateStep should be called only once
> ------------------------------------------------------------
>
>                 Key: TINKERPOP-2256
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2256
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>            Reporter: Wang Xiyu
>            Priority: Minor
>
> Currently the function processNextStart , hasNextBarrier, nextBarrier of 
> AggregateStep all call AggregateStep.processAllStarts:
> {code:java}
>     protected Traverser.Admin<S> processNextStart() {
>         this.processAllStarts();
>         return this.barrier.remove();
>     }{code}
> Every time we get a traverser from AggregateStep, 
> AggregateStep.processAllStarts will be called. Then processAllStarts calls 
> this.starts.hasNext().
> {code:java}
>     @Override
>     public void processAllStarts() {
>         if (this.starts.hasNext()) {
>             final BulkSet<Object> bulkSet = new BulkSet<>();
>             while (this.starts.hasNext()) {
>                 final Traverser.Admin<S> traverser = this.starts.next();
>                 bulkSet.add(TraversalUtil.applyNullable(traverser, 
> this.aggregateTraversal), traverser.bulk());
>                 traverser.setStepId(this.getNextStep().getId()); // when 
> barrier is reloaded, the traversers should be at the next step
>                 this.barrier.add(traverser);
>             }
>             this.getTraversal().getSideEffects().add(this.sideEffectKey, 
> bulkSet);
>         }
>     }
> {code}
> It results in a lot of hasNext call.
> As document says "The step uses [eager 
> evaluation|http://en.wikipedia.org/wiki/Eager_evaluation] in that no objects 
> continue on until all previous objects have been fully aggregated." maybe we 
> can limit the AggregateStep.processAllStarts only be called once.
>  
>  
>  
> We found this when we run DSL like this :
> {code:java}
> g.V().has('name','wxy').repeat(both("knows").simplePath()).emit().times(2).aggregate("friends"){code}
> and the plan is like this :
> {code:java}
> GraphStep(vertex,[name.eq(wxy)]), 
> RepeatStep([VertexStep(BOTH,[knows],vertex), PathFilterStep(simple), 
> RepeatEndStep],until(loops(2)),emit(true)), AggregateStep(friends)
> {code}
> Then we found thousands of calls to GraphStep(vertex,[name.eq(wxy)]).hasNext.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (TINKERPOP-2256) processAllStarts of AggregateStep should be called only once

Reply via email to