[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

ASF GitHub Bot (JIRA) Fri, 08 Jul 2016 06:31:05 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367673#comment-15367673
 ]


ASF GitHub Bot commented on TINKERPOP-1254:
-------------------------------------------

GitHub user twilmes opened a pull request:

    https://github.com/apache/tinkerpop/pull/358

    TINKERPOP-1254 Support dropping traverser path information when it is no 
longer needed

    This PR adds support for path retraction to increase the likelihood of 
bulking in OLTP and OLAP modes. Traversal analysis is performed during the 
application of the PrunePathStrategy to identify labels that may be dropped at 
various points in the traversal.  MatchStep also performs runtime analysis to 
determine which labels it can drop in addition to the labels identified during 
traversal strategy application.
    
    Here is a set of profiles showing the benefit of path dropping.  These were 
generated with `TinkerGraphComputer` first against 3.2 and then TinkerPop-1254. 
 **Note** that the times should not be compared here.  The first was run on my 
anemic macbook which, and the second was run on an 8 core AWS m3.2xlarge 
instance that I've been using for testing.  I'll be following up with more 
numbers on the same hardware but you can see here the dramatic drop in 
traversers with path pruning enabled.
    
    **TP 3.2**
    ```
    gremlin> g.V().match(__.as('a').out().as('b'), __.as('b').out().as('c'), 
__.as('c').out().as('d')).select('d').count().profile()
    ==>Traversal Metrics
    Step                                                               Count  
Traversers       Time (ms)    % Dur
    
=============================================================================================================
    TinkerGraphStep(vertex,[])                                           808    
     808          21.205     0.02
    MatchStep(AND,[[MatchStartStep(a), ProfileStep,...              14465066    
14465066      110317.813    85.88
      MatchStartStep(a)                                                  808    
     808       10975.797
      VertexStep(OUT,vertex)                                            8049    
    8049        6953.071
      MatchEndStep(b)                                                   8049    
    8049        6727.167
      MatchStartStep(b)                                                 8049    
    7957        5461.242
      VertexStep(OUT,vertex)                                          327370    
  327370        6782.024
      MatchEndStep(c)                                                 327370    
  327370        6238.268
      MatchStartStep(c)                                               327370    
  326983        1771.773
      VertexStep(OUT,vertex)                                        14465066    
14465066       11489.228
      MatchEndStep(d)                                               14465066    
14465066       14301.313
    SelectOneStep(d)                                                14465066    
14465066       13752.966    10.71
    CountGlobalStep                                                        1    
       1        4363.667     3.40
                                                >TOTAL                     -    
       -      128455.652        -
    ```
     **TinkerPop-1254**
    ```
    gremlin> g.V().match(__.as('a').out().as('b'), __.as('b').out().as('c'), 
__.as('c').out().as('d')).select('d').count().profile()
    ==>Traversal Metrics
    Step                                                               Count  
Traversers       Time (ms)    % Dur
    
=============================================================================================================
    GraphStep(vertex,[])                                                 808    
     808          32.453    19.96
    MatchStep(AND,[[MatchStartStep(a), ProfileStep,...              14465066    
    7510          89.463    55.01
      MatchStartStep(a)                                                  808    
     808          22.388
      VertexStep(OUT,vertex)                                            8049    
    7957          85.493
      MatchEndStep(b) (profiling ignored)                                       
                   0.000
      MatchStartStep(b)                                                 8049    
     563           7.488
      VertexStep(OUT,vertex)                                          327370    
    7561          19.548
      MatchEndStep(c) (profiling ignored)                                       
                   0.000
      MatchStartStep(c)                                               327370    
     452           4.247
      VertexStep(OUT,vertex)                                        14465066    
    7510          14.812
      MatchEndStep(d) (profiling ignored)                                       
                   0.000
    SelectOneStep(d)                                                14465066    
     452          40.711    25.03
    CountGlobalStep                                                        1    
       1           0.001     0.00
                                                >TOTAL                     -    
       -         162.630        -
    ```
    
    One thing that I did want to call out.  I was seeing some strange errors in 
the `MutablePath.retract` method that I traced to multiple threads attempting 
to do a retract at the same time.  I did not think this should happen or would 
be possible but I may be misunderstanding how `GraphComputer` executes.  To fix 
this, I added a `synchronized` block.  I was hesitant to do this but it fixed 
the problem and appears to have not slowed things down appreciably.  There may 
be a better way around this though so any input would be welcome.
    
    Tests pass and I got a passing integration test last night but made a few 
changes since then.  I'll follow up with new integration test results when they 
finish running today.
    
    VOTE: +1 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/tinkerpop TINKERPOP-1254

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tinkerpop/pull/358.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #358
    
----
commit 88c5d27bd4d0e2bb898be7f4c3a86007d6422174
Author: Ted Wilmes <twil...@gmail.com>
Date:   2016-07-08T12:37:41Z

    TINKERPOP-1254 Support dropping traverser path information when it is no 
longer needed
    
    This commit adds support for path retraction to increase the likelihood of 
bulking in OLTP and OLAP modes. Traversal analysis is performed during the 
application of the PrunePathStrategy to identify labels that may be dropped at 
various points in the traversal.  MatchStep also performs runtime analysis to 
determine which labels it can drop in addition to the labels identified during 
traversal strategy application.

----


> Support dropping traverser path information when it is no longer needed.
> ------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1254
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.1.1-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set<String>)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set<String> dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser<S> traverser) {
>     final Traverser<S> start = this.starts.next();
>     if(this.dropPath) start.dropPath();
>     else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

Reply via email to