Hi,
Ok took a while but got a partial solution working.
However I can not get it right, for now, forg.V().local(out().count())
I am not able to manage the localTraversal.reset() properly.
I should however be able to get it right later when I optimize
ReducingBarrierStep by pushing down the reducing to the db.
So for now I replace LocalStep andChooseStep (that do not have any
ReducingBarrierStep, RangeGlobal orSampleStep) with custom steps to
barrier the starts.
The biggest problem was to not loose the order of the incoming starts.
Basically I keep some state and sort the results afterwards.
There are more steps to replace but for now the strategy seems to be
working.
Cheers
Pieter
On 20/04/2017 21:59, Marko Rodriguez wrote:
Pieter…. you have a fascinating problem. Because this touches exactly
at one of the fundamental concepts of Gremlin — local vs. global
traversals.
g.V().out().count() == g.V().local(out()).count()
However,
g.V().out().count() != g.V().local(out().count())
///////////////
gremlin> g.V().out().count()
==>6
gremlin> g.V().local(out()).count()
==>6
gremlin> g.V().out().count()
==>6
gremlin> g.V().local(out().count())
==>3
==>0
==>0
==>2
==>0
==>1
gremlin>
So, your ability to grab the "global stream” and not just a single
object from it within a local traversal will require some trickery.
Look at:
g.V().local(out().count())
How would we do this so you pull all the V()’s into the Sqlg[out()]
but still single stream post process? Thats a good question. Hmmmm…
g.V().aggregate(‘x’).cap(‘x’).local(sqlgOut().count())
Now, Sqlg[out()] would do this:
1. Is the input a list? If yes, then execute all in batch.
2. Then pop off the first mapping as the output to count() only!
3. then Sqlg[out()].reset() method doesn’t clear…..
Wait… no, it can’t do that………………………………cause it will not next() correctly.
Hm. Wow…. mind blown.
Owwww………….
Marko.
On Apr 20, 2017, at 10:47 AM, pieter <[email protected]
<mailto:[email protected]>> wrote:
Sorry, forwarding was not a good idea either,
Here is an example with global children and the batching works well.
Sqlg does not currently optimize the 'where' (TraversalFilterStep).
@Test
public void testBatchingIncomingTraversersOnVertexStep() {
int count = 10_000;
for (int i = 0; i < count; i++) {
Vertex a1 = this.sqlgGraph.addVertex(T.label, "A");
Vertex b1 = this.sqlgGraph.addVertex(T.label, "B");
a1.addEdge("ab", b1);
}
this.sqlgGraph.tx().commit();
GraphTraversal traversal = this.sqlgGraph.traversal()
.V().where(__.hasLabel("A"))
.out();
printTraversalForm(traversal);
List<Vertex> vertices = traversal.toList();
assertEquals(count, vertices.size());
}
This prints out,
pre-strategy:[GraphStep(vertex,[]),
TraversalFilterStep([HasStep([~label.eq(A)])]), VertexStep(OUT,vertex)]
post-strategy:[SqlgGraphStepCompiled(vertex,[])@[sqlgPathFakeLabel],
HasStep([~label.eq(A)]), SqlgVertexStepCompiled@[sqlgPathFakeLabel]]
The SqlgVertexStepCompiled is able to iterate all 10 000 incoming
traversers and execute one query for the out().
This reduced the query time from 12 seconds to 0.4 seconds. Happiness!!
An example with a local traversal.
@Test
public void testBatchingIncomingTraversersOnLocalVertexStep() {
int count = 10_000;
for (int i = 0; i < count; i++) {
Vertex a1 = this.sqlgGraph.addVertex(T.label, "A");
Vertex b1 = this.sqlgGraph.addVertex(T.label, "B");
a1.addEdge("ab", b1);
}
this.sqlgGraph.tx().commit();
GraphTraversal traversal = this.sqlgGraph.traversal()
.V().hasLabel("A")
.local(
__.out()
);
printTraversalForm(traversal);
List<Vertex> vertices = traversal.toList();
Assert.assertEquals(count, vertices.size());
}
This prints out,
pre-strategy:[GraphStep(vertex,[]), HasStep([~label.eq(A)]),
LocalStep([VertexStep(OUT,vertex)])]
post-strategy:[SqlgGraphStepCompiled(vertex,[])@[sqlgPathFakeLabel],
LocalStep([SqlgVertexStepCompiled@[sqlgPathFakeLabel]])]
In this case SqlgVertexStepCompiled is a local traversal of the
LocalStep.
Iterating the starts only returns one traverser as the LocalStep only
puts one on the traversal at a time.
I suppose I can replace LocalStep with a custom one but there are many
steps with local children which will make things
fragile if I were to replace so many steps in a copy paste fashion.
Thanks
Pieter
On Thu, 2017-04-20 at 09:10 -0600, Marko Rodriguez wrote:
Hello,
I have started optimizing Sqlg to do a bulk/barrier for its
VertexStep
optimizations.
Cool.
Sqlg has two optimization strategies.
GraphStepStrategy and VertexStepStrategy. GraphStepStrategy
executes
first and then VertexStepStrategy.
GraphStepStrategy starts at the beginning of the traversal
optimizing
from left to right till it reaches a step that it can not optimize
and
terminates.
Makes sense.
After that VertexStepStrategy tries to optimize what remains.
It ultimately replaces optimizable sequential steps with a
SqlgVertexStep.
Okay...
Thus far the SqlgVertexStep always has one incoming traverser from
where it continues the traversal. Basically it translated to a sql
where clause with the incoming traversal element's id.
The current optimization is to bulk the incoming traversers and
execute
the traversal for all incoming traversers in one go. This reduces
latency and has a drastic performance improvement.
I do the same as the existing BarrierSteps and iterate the `starts`
to
collect all the left incoming traversers and from there I continue
and
all is well.
Smart. You got chops.
However for local traversals there is only one start on the
traversal
so the barrier idea is not working.
Is there a way barrier all incoming left traversers on local
traversals?
Eeeeeeeeeeeeeeee…… huuuuuuuhhhhhhhh…………….
There is a “easy” and there is a “hard.” Give me an example traversal
and lets discuss from a more specific standpoint before
generalizing...
Thanks,
Marko.
http://markorodriguez.com