[ https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369701#comment-15369701 ]
ASF GitHub Bot commented on TINKERPOP-1254: ------------------------------------------- Github user twilmes commented on the issue: https://github.com/apache/tinkerpop/pull/358 I made a small update to `ReferencePath` to create new label Sets when a patch is detached. This had been causing issues where the first set of labels for a path where being shared across `MutablePaths` after detachment. A label would be removed from one, and therefore all of the traversers that had that label `Set` in their path, would be affected. The `PathProcessors` are now respecting keepLabels null and labels are not dropped if `PrunePathStrategy` is not applied. **PrunePathStrategy on** ``` g.V().match( as("a").in("sungBy").as("b"), as("a").in("sungBy").as("c"), as("b").out("writtenBy").as("d"), as("c").out("writtenBy").as("e"), as("d").has("name", "George_Harrison"), as("e").has("name", "Bob_Marley")).select("a").count().profile() Traversal Metrics Step Count Traversers Time (ms) % Dur ============================================================================================================= GraphStep(vertex,[]) 808 808 44.217 99.97 MatchStep(AND,[[MatchStartStep(a), ProfileStep,... 1 1 0.004 0.01 MatchStartStep(a) 808 808 43.517 VertexStep(IN,[sungBy],vertex) 501 499 20.323 MatchEndStep(b) (profiling ignored) 0.000 MatchStartStep(a) 2 2 0.006 VertexStep(IN,[sungBy],vertex) 156 156 2.129 MatchEndStep(c) (profiling ignored) 0.000 MatchStartStep(b) 501 499 5.126 VertexStep(OUT,[writtenBy],vertex) 509 504 3.423 MatchEndStep(d) (profiling ignored) 0.000 MatchStartStep(c) 156 156 1.083 VertexStep(OUT,[writtenBy],vertex) 157 157 1.029 MatchEndStep(e) (profiling ignored) 0.000 MatchStartStep(d) 509 266 1.685 HasStep([name.eq(George_Harrison)]) 2 2 0.002 MatchEndStep (profiling ignored) 0.000 MatchStartStep(e) 157 57 0.391 HasStep([name.eq(Bob_Marley)]) 1 1 0.001 MatchEndStep (profiling ignored) 0.000 SelectOneStep(a) 1 1 0.003 0.01 CountGlobalStep 1 1 0.003 0.01 >TOTAL - - 44.228 - ``` **PrunePathStrategy off** ``` Traversal Metrics Step Count Traversers Time (ms) % Dur ============================================================================================================= GraphStep(vertex,[]) 808 808 7.565 99.84 MatchStep(AND,[[MatchStartStep(a), ProfileStep,... 1 1 0.007 0.10 MatchStartStep(a) 808 808 5.726 VertexStep(IN,[sungBy],vertex) 501 499 9.532 MatchEndStep(b) (profiling ignored) 0.000 MatchStartStep(a) 2 2 0.007 VertexStep(IN,[sungBy],vertex) 156 156 1.803 MatchEndStep(c) (profiling ignored) 0.000 MatchStartStep(b) 501 499 3.221 VertexStep(OUT,[writtenBy],vertex) 509 504 2.297 MatchEndStep(d) (profiling ignored) 0.000 MatchStartStep(c) 156 156 1.192 VertexStep(OUT,[writtenBy],vertex) 157 157 0.910 MatchEndStep(e) (profiling ignored) 0.000 MatchStartStep(d) 509 504 1.533 HasStep([name.eq(George_Harrison)]) 2 2 0.002 MatchEndStep (profiling ignored) 0.000 MatchStartStep(e) 157 157 1.425 HasStep([name.eq(Bob_Marley)]) 1 1 0.000 MatchEndStep (profiling ignored) 0.000 SelectOneStep(a) 1 1 0.003 0.04 CountGlobalStep 1 1 0.001 0.02 >TOTAL - - 7.577 - ``` I am running the integration tests now and will respond back when they complete. > Support dropping traverser path information when it is no longer needed. > ------------------------------------------------------------------------ > > Key: TINKERPOP-1254 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1254 > Project: TinkerPop > Issue Type: Improvement > Components: process > Affects Versions: 3.1.1-incubating > Reporter: Marko A. Rodriguez > Assignee: Ted Wilmes > > The most expensive traversals (especially in OLAP) are those that can not be > "bulked." There are various reasons why two traversers at the same object can > not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That > is, when the history of the traverser is required, the probability of two > traversers having the same history is low. > A key to making traversals more efficient is to do as a much as possible to > remove historic information from a traverser so it can get bulked. How does > one do this? > {code} > g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name > {code} > The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at > which point, at {{both()}}, they are no longer required. It would be smart to > support: > {code} > traverser.dropLabels(Set<String>) > traverser.dropPath() > {code} > We would then, via a {{TraversalOptimizationStrategy}} insert a step between > {{where()}} and {{both()}} called {{PathPruneStep}} which would be a > {{SideEffectStep}}. The strategy would know which labels were no longer > needed (via forward lookahead) and then do: > {code} > public class PathPruneStep { > final Set<String> dropLabels = ... > final boolean dropPath = ... > public void sideEffect(final Traverser<S> traverser) { > final Traverser<S> start = this.starts.next(); > if(this.dropPath) start.dropPath(); > else start.dropLabels(labels); > } > } > {code} > Again, the more we can prune historic path data no longer needed, the higher > the probability of bulking. Think about this in terms of {{match()}}. > {code} > g.V().match( > a.out.b, > b.out.c, > c.neq.a, > c.out.b, > ).select("a") > {code} > All we need is "a" at the end. Thus, once a pattern has been passed and no > future patterns require that label, drop it! > This idea is related to TINKERPOP-331, but I don't think we should deal with > manipulating the species. Thus, I think 331 is too "low level." -- This message was sent by Atlassian JIRA (v6.3.4#6332)