…do your vertices implement hashCode() and equals() “correctly” ?

Marko.



> On Oct 17, 2017, at 2:40 PM, Stephen Mallette <[email protected]> wrote:
> 
>> So if I understand correctly the map is only needed for bulking so quite
> often is not needed.
> 
> afaik, it is only used for bulking though it's hard to characterize how
> often it is used - i suppose it all depends on the types of traversals you
> write and the nature of the data being traversed.
> 
>> A significant difference.
> 
> The performance numbers are interesting. You don't get a speedup in sqlg
> though when bullking would be enacted though - only when bulking would have
> no effect - correct?
> 
> 
> 
> On Fri, Oct 13, 2017 at 3:48 PM, pieter gmail <[email protected]>
> wrote:
> 
>> Hi,
>> 
>> Doing step optimizations I am noticing a rather severe performance hit in
>> TraverserSet.
>> 
>> Sqlg does a secondary optimization on steps that it can not optimize from
>> the GraphStep. Before the secondary optimization these steps will execute
>> at least one query for each incoming start. The optimization caches the
>> incoming start traverser and the step is executed for all incoming
>> traversers in one go. This has the effect of changing the semantics into a
>> breath first traversal as opposed to the default depth first.
>> 
>> So basically the replaced steps code looks like follows
>> 
>>    @Override
>>    protected Traverser.Admin<S> processNextStart() throws
>> NoSuchElementException {
>>        if (this.first) {
>>            this.first = false;
>>            while (this.starts.hasNext()) {
>>                Traverser.Admin<S> start = this.starts.next();
>>                this.traversal.addStart(start);
>>            }
>>    ....
>> 
>> The performance hit is in the this.traversal.addStart(start) which ends up
>> putting the start into the TraverserSet's internal LinkedHashMap.
>> 
>> So if I understand correctly the map is only needed for bulking so quite
>> often is not needed. Replacing the map with an ArrayList improves the
>> performance drastically.
>> 
>> For the test the optimization does the following. I replace the
>> TraversalFilterStep with a custom SqlTraversalFilterStep which extends from
>> a custom SqlAbstractStep. The custom SqlgAbstractStep in turn replaces the
>> ExpandableStepIterator with a custom SqlgExpandableStepIterator which is a
>> copy of ExpandableStepIterator except for replacing TraverserSet with a
>> List<Traverser.Admin<S>> traversers = new ArrayList<>();
>> 
>>    @Test
>>    public void testSqlgTraversalFilterStepPerformance() {
>>        this.sqlgGraph.tx().normalBatchModeOn();
>>        int count = 10000;
>>        for (int i = 0; i < count; i++) {
>>            Vertex a1 = this.sqlgGraph.addVertex(T.label, "A", "name",
>> "a1");
>>            Vertex b1 = this.sqlgGraph.addVertex(T.label, "B", "name",
>> "b1");
>>            a1.addEdge("ab", b1);
>>        }
>>        this.sqlgGraph.tx().commit();
>> 
>>        StopWatch stopWatch = new StopWatch();
>>        for (int i = 0; i < 1000; i++) {
>>            stopWatch.start();
>>            GraphTraversal<Vertex, Vertex> traversal =
>> this.sqlgGraph.traversal()
>>                    .V().hasLabel("A")
>>                    .where(__.out().hasLabel("B"));
>>            List<Vertex> vertices = traversal.toList();
>>            Assert.assertEquals(count, vertices.size());
>>            stopWatch.stop();
>>            System.out.println(stopWatch.toString());
>>            stopWatch.reset();
>>        }
>>    }
>> 
>> Without the ArrayList optimization the output is,
>> 0:00:12.198
>> 0:00:09.756
>> 0:00:09.435
>> 0:00:14.466
>> 0:00:10.197
>> 0:00:04.937
>> 0:00:02.974
>> 0:00:02.942
>> 0:00:02.977
>> 0:00:03.142
>> 0:00:03.207
>> 
>> With the ArrayList optimization the output is,
>> 0:00:00.334
>> 0:00:00.147
>> 0:00:00.114
>> 0:00:00.100
>> ... time for jit
>> 0:00:00.055
>> 0:00:00.056
>> 0:00:00.054
>> 0:00:00.053
>> 0:00:00.054
>> 0:00:00.055
>> 
>> A significant difference.
>> 
>> For TinkerGraph this tests optimization is moot as the TraversalFilterStep
>> resets the step for every step making the TraverserSet's map empty so the
>> traversers equals method is never called.
>> 
>> Not sure if there are scenarios where this optimization will be any good
>> for TinkerGraph but thought I'd let you know how I am optimizing steps.
>> 
>> A concern is that I am now replacing core steps which makes Sqlg further
>> away from the reference implementation making it fragile to changes in
>> TinkerPop and harder to keep up to upstream changes. Perhaps there is a way
>> to make TravererSet's current behavior configurable?
>> 
>> Cheers
>> Pieter
>> 
>> 
>> 
>> 

Reply via email to