[
https://issues.apache.org/jira/browse/TINKERPOP-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881520#comment-16881520
]
Wang Xiyu edited comment on TINKERPOP-2256 at 7/10/19 12:42 PM:
----------------------------------------------------------------
I will provide a better benchmark between the version avoiding the hasNext()
call and the origin version tomorrow.
============
But now there is only a simple benchmark with TinkerGraph:
Machine : MacBook Pro (13-inch, 2019), 2.4 GHz Intel Core i5
{code:java}
gremlin> graph = TinkerFactory.createGratefulDead()
gremlin> g = graph.traversal()
gremlin> g.V(1).out().out().out().out().aggregate('a').path().profile()
==>Traversal Metrics
Step Count
Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1]) 1
1 110.440 7.48
VertexStep(OUT,vertex) 7
7 132.841 8.99
VertexStep(OUT,vertex) 190
190 116.895 7.91
VertexStep(OUT,vertex) 7846
7846 155.564 10.53
VertexStep(OUT,vertex) 344093
344093 210.354 14.24
AggregateStep(a) 344093
343721 643.464 43.55
PathStep 344093
343721 107.842 7.30
>TOTAL -
- 1477.402
gremlin> g.V(1).out().out().out().out().path().profile()
==>Traversal Metrics
Step Count
Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1]) 1
1 0.052 0.03
VertexStep(OUT,vertex) 7
7 0.015 0.01
VertexStep(OUT,vertex) 190
190 0.072 0.04
VertexStep(OUT,vertex) 7846
7846 1.921 1.02
VertexStep(OUT,vertex) 344093
344093 88.395 46.97
PathStep 344093
344093 97.757 51.94
>TOTAL -
- 188.215 -
gremlin> g.V(1).out().out().out().out().store('a').path().profile()
==>Traversal Metrics
Step Count
Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1]) 1
1 0.061 0.02
VertexStep(OUT,vertex) 7
7 0.017 0.01
VertexStep(OUT,vertex) 190
190 0.078 0.02
VertexStep(OUT,vertex) 7846
7846 1.947 0.58
VertexStep(OUT,vertex) 344093
344093 78.538 23.45
StoreStep(a) 344093
344093 155.571 46.44
PathStep 344093
344093 98.765 29.48
>TOTAL -
- 334.980 -
{code}
out().out().out().out() is used to create enough Traversers.
path() is used to disable LazyBarrierStrategy.
This simple benchmark shows that AggregateStep cause huge overhead on the first
step ( TinkerGraphStep(vertex,[1]) ). After we broken down the 110.440ms, we
found that most of it was wasted by hasNext.
was (Author: wangxiyu191):
I will provide a better benchmark between the version avoiding the hasNext()
call and the origin version tomorrow.
============
But now there is only a simple benchmark with TinkerGraph:
{code:java}
gremlin> graph = TinkerFactory.createGratefulDead()
gremlin> g = graph.traversal()
gremlin> g.V(1).out().out().out().out().aggregate('a').path().profile()
==>Traversal Metrics
Step Count
Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1]) 1
1 110.440 7.48
VertexStep(OUT,vertex) 7
7 132.841 8.99
VertexStep(OUT,vertex) 190
190 116.895 7.91
VertexStep(OUT,vertex) 7846
7846 155.564 10.53
VertexStep(OUT,vertex) 344093
344093 210.354 14.24
AggregateStep(a) 344093
343721 643.464 43.55
PathStep 344093
343721 107.842 7.30
>TOTAL -
- 1477.402
gremlin> g.V(1).out().out().out().out().path().profile()
==>Traversal Metrics
Step Count
Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1]) 1
1 0.052 0.03
VertexStep(OUT,vertex) 7
7 0.015 0.01
VertexStep(OUT,vertex) 190
190 0.072 0.04
VertexStep(OUT,vertex) 7846
7846 1.921 1.02
VertexStep(OUT,vertex) 344093
344093 88.395 46.97
PathStep 344093
344093 97.757 51.94
>TOTAL -
- 188.215 -
gremlin> g.V(1).out().out().out().out().store('a').path().profile()
==>Traversal Metrics
Step Count
Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1]) 1
1 0.061 0.02
VertexStep(OUT,vertex) 7
7 0.017 0.01
VertexStep(OUT,vertex) 190
190 0.078 0.02
VertexStep(OUT,vertex) 7846
7846 1.947 0.58
VertexStep(OUT,vertex) 344093
344093 78.538 23.45
StoreStep(a) 344093
344093 155.571 46.44
PathStep 344093
344093 98.765 29.48
>TOTAL -
- 334.980 -
{code}
out().out().out().out() is used to create enough Traversers.
path() is used to disable LazyBarrierStrategy.
This simple benchmark shows that AggregateStep cause huge overhead on the first
step ( TinkerGraphStep(vertex,[1]) ). After we broken down the 110.440ms, we
found that most of it was wasted by hasNext.
> processAllStarts of AggregateStep should be called only once
> ------------------------------------------------------------
>
> Key: TINKERPOP-2256
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2256
> Project: TinkerPop
> Issue Type: Improvement
> Components: process
> Reporter: Wang Xiyu
> Priority: Minor
>
> Currently the function processNextStart , hasNextBarrier, nextBarrier of
> AggregateStep all call AggregateStep.processAllStarts:
> {code:java}
> protected Traverser.Admin<S> processNextStart() {
> this.processAllStarts();
> return this.barrier.remove();
> }{code}
> Every time we get a traverser from AggregateStep,
> AggregateStep.processAllStarts will be called. Then processAllStarts calls
> this.starts.hasNext().
> {code:java}
> @Override
> public void processAllStarts() {
> if (this.starts.hasNext()) {
> final BulkSet<Object> bulkSet = new BulkSet<>();
> while (this.starts.hasNext()) {
> final Traverser.Admin<S> traverser = this.starts.next();
> bulkSet.add(TraversalUtil.applyNullable(traverser,
> this.aggregateTraversal), traverser.bulk());
> traverser.setStepId(this.getNextStep().getId()); // when
> barrier is reloaded, the traversers should be at the next step
> this.barrier.add(traverser);
> }
> this.getTraversal().getSideEffects().add(this.sideEffectKey,
> bulkSet);
> }
> }
> {code}
> It results in a lot of hasNext call.
> As document says "The step uses [eager
> evaluation|http://en.wikipedia.org/wiki/Eager_evaluation] in that no objects
> continue on until all previous objects have been fully aggregated." maybe we
> can limit the AggregateStep.processAllStarts only be called once.
>
>
>
> We found this when we run DSL like this :
> {code:java}
> g.V().has('name','wxy').repeat(both("knows").simplePath()).emit().times(2).aggregate("friends"){code}
> and the plan is like this :
> {code:java}
> GraphStep(vertex,[name.eq(wxy)]),
> RepeatStep([VertexStep(BOTH,[knows],vertex), PathFilterStep(simple),
> RepeatEndStep],until(loops(2)),emit(true)), AggregateStep(friends)
> {code}
> Then we found thousands of calls to GraphStep(vertex,[name.eq(wxy)]).hasNext.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)