Re: Notes on TraverserSet and Sqlg optimizations

pieter gmail Tue, 17 Oct 2017 23:24:05 -0700

Yes the hasCode() and equals() is correct. It is however a slightlyheavier operation than TinkerGraph as Sqlg's Element's id is a morecomplex object holding the label and its id.

I should have mentioned that in Sqlg the traverser is always aB_LP_O_P_S_SE_SL_Traverser. As Sqlg returns multiple VertexSteps in onego I use the path information to reconstruct the jdbc ResultSet from thedb. This makes the hashCode() and equals() operation heavier as it iscalled on B_LP_O_P_S_SE_SL_Traverser which calls hashCode() and equals()on Path and they in turn are non trivial operations.


Cheers
Pieter



On 17/10/2017 23:58, Marko Rodriguez wrote:

…do your vertices implement hashCode() and equals() “correctly” ?

Marko.

On Oct 17, 2017, at 2:40 PM, Stephen Mallette <[email protected]> wrote:

So if I understand correctly the map is only needed for bulking so quite

often is not needed.

afaik, it is only used for bulking though it's hard to characterize how
often it is used - i suppose it all depends on the types of traversals you
write and the nature of the data being traversed.

A significant difference.

The performance numbers are interesting. You don't get a speedup in sqlg
though when bullking would be enacted though - only when bulking would have
no effect - correct?



On Fri, Oct 13, 2017 at 3:48 PM, pieter gmail <[email protected]>
wrote:

Hi,

Doing step optimizations I am noticing a rather severe performance hit in
TraverserSet.

Sqlg does a secondary optimization on steps that it can not optimize from
the GraphStep. Before the secondary optimization these steps will execute
at least one query for each incoming start. The optimization caches the
incoming start traverser and the step is executed for all incoming
traversers in one go. This has the effect of changing the semantics into a
breath first traversal as opposed to the default depth first.

So basically the replaced steps code looks like follows

    @Override
    protected Traverser.Admin<S> processNextStart() throws
NoSuchElementException {
        if (this.first) {
            this.first = false;
            while (this.starts.hasNext()) {
                Traverser.Admin<S> start = this.starts.next();
                this.traversal.addStart(start);
            }
    ....

The performance hit is in the this.traversal.addStart(start) which ends up
putting the start into the TraverserSet's internal LinkedHashMap.

So if I understand correctly the map is only needed for bulking so quite
often is not needed. Replacing the map with an ArrayList improves the
performance drastically.

For the test the optimization does the following. I replace the
TraversalFilterStep with a custom SqlTraversalFilterStep which extends from
a custom SqlAbstractStep. The custom SqlgAbstractStep in turn replaces the
ExpandableStepIterator with a custom SqlgExpandableStepIterator which is a
copy of ExpandableStepIterator except for replacing TraverserSet with a
List<Traverser.Admin<S>> traversers = new ArrayList<>();

    @Test
    public void testSqlgTraversalFilterStepPerformance() {
        this.sqlgGraph.tx().normalBatchModeOn();
        int count = 10000;
        for (int i = 0; i < count; i++) {
            Vertex a1 = this.sqlgGraph.addVertex(T.label, "A", "name",
"a1");
            Vertex b1 = this.sqlgGraph.addVertex(T.label, "B", "name",
"b1");
            a1.addEdge("ab", b1);
        }
        this.sqlgGraph.tx().commit();

        StopWatch stopWatch = new StopWatch();
        for (int i = 0; i < 1000; i++) {
            stopWatch.start();
            GraphTraversal<Vertex, Vertex> traversal =
this.sqlgGraph.traversal()
                    .V().hasLabel("A")
                    .where(__.out().hasLabel("B"));
            List<Vertex> vertices = traversal.toList();
            Assert.assertEquals(count, vertices.size());
            stopWatch.stop();
            System.out.println(stopWatch.toString());
            stopWatch.reset();
        }
    }

Without the ArrayList optimization the output is,
0:00:12.198
0:00:09.756
0:00:09.435
0:00:14.466
0:00:10.197
0:00:04.937
0:00:02.974
0:00:02.942
0:00:02.977
0:00:03.142
0:00:03.207

With the ArrayList optimization the output is,
0:00:00.334
0:00:00.147
0:00:00.114
0:00:00.100
... time for jit
0:00:00.055
0:00:00.056
0:00:00.054
0:00:00.053
0:00:00.054
0:00:00.055

A significant difference.

For TinkerGraph this tests optimization is moot as the TraversalFilterStep
resets the step for every step making the TraverserSet's map empty so the
traversers equals method is never called.

Not sure if there are scenarios where this optimization will be any good
for TinkerGraph but thought I'd let you know how I am optimizing steps.

A concern is that I am now replacing core steps which makes Sqlg further
away from the reference implementation making it fragile to changes in
TinkerPop and harder to keep up to upstream changes. Perhaps there is a way
to make TravererSet's current behavior configurable?

Cheers
Pieter

Re: Notes on TraverserSet and Sqlg optimizations

Reply via email to