[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308725#comment-15308725
 ] 

Marko A. Rodriguez commented on TINKERPOP-1254:
-----------------------------------------------

After a long walk and some thought, I think we should do this.

1. There is a {{PathPruning}} interface.
2. {{MatchStep}}, {{WhereStep}}, {{SelectStep}}, {{SelectOneStep}}, 
{{DedupGlobal}}, {{TreeStep}}, {{PathStep}} all implement it.
3. {{PathPruning.setDropLabels(Set<String> labels)}} exists.
4. The static method {{PathPruning.dropLabels(Traverser, Set<String>)}} exists.
5. {{PathPruningStrategy}} will identify any {{PathPruning}} steps infer what 
steps after it require path label information and then {{setDropLabels()}}.

Now, there is no {{DropLabelStep}} and only those steps that actually use path 
information have the logic to drop path information accordingly. Benefits of 
this that we don't introduce new steps (and the run time of that step's 
iterator), {{MatchStep}} isn't "funky", and we don't run the risk of causing 
strategy compilation issues when multiple provider strategies have to reason on 
{{DropLabelStep}} insertions.

NOTES:

{{PathPruning.setDropLabels(Set<String>)}} should do the following:
  1. If its never called, then {{null}} is in the step for that field and thus, 
don't prune.
  2. If its called with {{Collections.emptySet()}} (i.e. empty set), then that 
means drop full path.
  3. If its called with a set that is not empty, that those are the labels to 
drop.


> Support dropping traverser path information when it is no longer needed.
> ------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1254
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.1.1-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set<String>)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set<String> dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser<S> traverser) {
>     final Traverser<S> start = this.starts.next();
>     if(this.dropPath) start.dropPath();
>     else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to