…do your vertices implement hashCode() and equals() “correctly” ? Marko.
> On Oct 17, 2017, at 2:40 PM, Stephen Mallette <[email protected]> wrote: > >> So if I understand correctly the map is only needed for bulking so quite > often is not needed. > > afaik, it is only used for bulking though it's hard to characterize how > often it is used - i suppose it all depends on the types of traversals you > write and the nature of the data being traversed. > >> A significant difference. > > The performance numbers are interesting. You don't get a speedup in sqlg > though when bullking would be enacted though - only when bulking would have > no effect - correct? > > > > On Fri, Oct 13, 2017 at 3:48 PM, pieter gmail <[email protected]> > wrote: > >> Hi, >> >> Doing step optimizations I am noticing a rather severe performance hit in >> TraverserSet. >> >> Sqlg does a secondary optimization on steps that it can not optimize from >> the GraphStep. Before the secondary optimization these steps will execute >> at least one query for each incoming start. The optimization caches the >> incoming start traverser and the step is executed for all incoming >> traversers in one go. This has the effect of changing the semantics into a >> breath first traversal as opposed to the default depth first. >> >> So basically the replaced steps code looks like follows >> >> @Override >> protected Traverser.Admin<S> processNextStart() throws >> NoSuchElementException { >> if (this.first) { >> this.first = false; >> while (this.starts.hasNext()) { >> Traverser.Admin<S> start = this.starts.next(); >> this.traversal.addStart(start); >> } >> .... >> >> The performance hit is in the this.traversal.addStart(start) which ends up >> putting the start into the TraverserSet's internal LinkedHashMap. >> >> So if I understand correctly the map is only needed for bulking so quite >> often is not needed. Replacing the map with an ArrayList improves the >> performance drastically. >> >> For the test the optimization does the following. I replace the >> TraversalFilterStep with a custom SqlTraversalFilterStep which extends from >> a custom SqlAbstractStep. The custom SqlgAbstractStep in turn replaces the >> ExpandableStepIterator with a custom SqlgExpandableStepIterator which is a >> copy of ExpandableStepIterator except for replacing TraverserSet with a >> List<Traverser.Admin<S>> traversers = new ArrayList<>(); >> >> @Test >> public void testSqlgTraversalFilterStepPerformance() { >> this.sqlgGraph.tx().normalBatchModeOn(); >> int count = 10000; >> for (int i = 0; i < count; i++) { >> Vertex a1 = this.sqlgGraph.addVertex(T.label, "A", "name", >> "a1"); >> Vertex b1 = this.sqlgGraph.addVertex(T.label, "B", "name", >> "b1"); >> a1.addEdge("ab", b1); >> } >> this.sqlgGraph.tx().commit(); >> >> StopWatch stopWatch = new StopWatch(); >> for (int i = 0; i < 1000; i++) { >> stopWatch.start(); >> GraphTraversal<Vertex, Vertex> traversal = >> this.sqlgGraph.traversal() >> .V().hasLabel("A") >> .where(__.out().hasLabel("B")); >> List<Vertex> vertices = traversal.toList(); >> Assert.assertEquals(count, vertices.size()); >> stopWatch.stop(); >> System.out.println(stopWatch.toString()); >> stopWatch.reset(); >> } >> } >> >> Without the ArrayList optimization the output is, >> 0:00:12.198 >> 0:00:09.756 >> 0:00:09.435 >> 0:00:14.466 >> 0:00:10.197 >> 0:00:04.937 >> 0:00:02.974 >> 0:00:02.942 >> 0:00:02.977 >> 0:00:03.142 >> 0:00:03.207 >> >> With the ArrayList optimization the output is, >> 0:00:00.334 >> 0:00:00.147 >> 0:00:00.114 >> 0:00:00.100 >> ... time for jit >> 0:00:00.055 >> 0:00:00.056 >> 0:00:00.054 >> 0:00:00.053 >> 0:00:00.054 >> 0:00:00.055 >> >> A significant difference. >> >> For TinkerGraph this tests optimization is moot as the TraversalFilterStep >> resets the step for every step making the TraverserSet's map empty so the >> traversers equals method is never called. >> >> Not sure if there are scenarios where this optimization will be any good >> for TinkerGraph but thought I'd let you know how I am optimizing steps. >> >> A concern is that I am now replacing core steps which makes Sqlg further >> away from the reference implementation making it fragile to changes in >> TinkerPop and harder to keep up to upstream changes. Perhaps there is a way >> to make TravererSet's current behavior configurable? >> >> Cheers >> Pieter >> >> >> >>
