Notes on TraverserSet and Sqlg optimizations

pieter gmail Fri, 13 Oct 2017 12:49:00 -0700

Hi,

Doing step optimizations I am noticing a rather severe performance hitin TraverserSet.

Sqlg does a secondary optimization on steps that it can not optimizefrom the GraphStep. Before the secondary optimization these steps willexecute at least one query for each incoming start. The optimizationcaches the incoming start traverser and the step is executed for allincoming traversers in one go. This has the effect of changing thesemantics into a breath first traversal as opposed to the default depthfirst.


So basically the replaced steps code looks like follows

    @Override

protected Traverser.Admin<S> processNextStart() throwsNoSuchElementException {

        if (this.first) {
            this.first = false;
            while (this.starts.hasNext()) {
                Traverser.Admin<S> start = this.starts.next();
                this.traversal.addStart(start);
            }
    ....

The performance hit is in the this.traversal.addStart(start) which endsup putting the start into the TraverserSet's internal LinkedHashMap.

So if I understand correctly the map is only needed for bulking so quiteoften is not needed. Replacing the map with an ArrayList improves theperformance drastically.

For the test the optimization does the following. I replace theTraversalFilterStep with a custom SqlTraversalFilterStep which extendsfrom a custom SqlAbstractStep. The custom SqlgAbstractStep in turnreplaces the ExpandableStepIterator with a customSqlgExpandableStepIterator which is a copy of ExpandableStepIteratorexcept for replacing TraverserSet with a List<Traverser.Admin<S>>traversers = new ArrayList<>();


    @Test
    public void testSqlgTraversalFilterStepPerformance() {
        this.sqlgGraph.tx().normalBatchModeOn();
        int count = 10000;
        for (int i = 0; i < count; i++) {

Vertex a1 = this.sqlgGraph.addVertex(T.label, "A", "name","a1"); Vertex b1 = this.sqlgGraph.addVertex(T.label, "B", "name","b1");

            a1.addEdge("ab", b1);
        }
        this.sqlgGraph.tx().commit();

        StopWatch stopWatch = new StopWatch();
        for (int i = 0; i < 1000; i++) {
            stopWatch.start();

GraphTraversal<Vertex, Vertex> traversal =this.sqlgGraph.traversal()

                    .V().hasLabel("A")
                    .where(__.out().hasLabel("B"));
            List<Vertex> vertices = traversal.toList();
            Assert.assertEquals(count, vertices.size());
            stopWatch.stop();
            System.out.println(stopWatch.toString());
            stopWatch.reset();
        }
    }

Without the ArrayList optimization the output is,
0:00:12.198
0:00:09.756
0:00:09.435
0:00:14.466
0:00:10.197
0:00:04.937
0:00:02.974
0:00:02.942
0:00:02.977
0:00:03.142
0:00:03.207

With the ArrayList optimization the output is,
0:00:00.334
0:00:00.147
0:00:00.114
0:00:00.100
... time for jit
0:00:00.055
0:00:00.056
0:00:00.054
0:00:00.053
0:00:00.054
0:00:00.055

A significant difference.

For TinkerGraph this tests optimization is moot as theTraversalFilterStep resets the step for every step making theTraverserSet's map empty so the traversers equals method is never called.

Not sure if there are scenarios where this optimization will be any goodfor TinkerGraph but thought I'd let you know how I am optimizing steps.

A concern is that I am now replacing core steps which makes Sqlg furtheraway from the reference implementation making it fragile to changes inTinkerPop and harder to keep up to upstream changes. Perhaps there is away to make TravererSet's current behavior configurable?


Cheers
Pieter

Notes on TraverserSet and Sqlg optimizations

Reply via email to