[ https://issues.apache.org/jira/browse/TINKERPOP-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197322#comment-15197322 ]
ASF GitHub Bot commented on TINKERPOP-1206: ------------------------------------------- GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/266 TINKERPOP-1206 & TINKERPOP-1014: Bulk Traversers and Requirements Optimization https://issues.apache.org/jira/browse/TINKERPOP-1206 https://issues.apache.org/jira/browse/TINKERPOP-1014 Two tangentially related issues are solved in this PR. The core of Gremlin is `Step`. We have this notion of an `ExpandableStepIterator` which knows either to get local traversers or traversers from the previous step. We use to have it where if you `addStarts(Iterator<Traverser>)`, it would add them to their own `MultiIterator`. This was pointless cause we don't get the benefit of bulking. Not only do we simplify the logic of this fundamental class, but we also reduce the amount of objects it creates/maintains and we all increase the likelihood of bulking (which is the ultimate potential optimization). The consequence is that we no longer support `O_Traverser` as bulking is ALWAYS required. This is fine -- very few traversals benefited from `O_Traverser` and all they gained was no `long` bulk field (trivial). `O_Traverser` is now abstract. This got me into thinking about how we calculate requirements and I realized how insanely stupid we (me) were especially for `by()`-modulators where EVERY call to a local traversal (typically `by()`-traversals) were recalculating the requirements!!!! That is, a full walk through the traversal tree. I decided to make `TraverserGenerator` static to the root traversal and when it is needed (the first time), it calculates the requirements and locks in the generator. Likewise, for child traversals, they look to the root traversal for the generator and lock it in locally. Thus, all this notion of requirements being expensive just vanished. We should keep our "traverser species" as calculating requirements is now cheap. CHANGELOG ``` * Optimized `ExpandableStepIterator` with simpler logic and increased the likelihood of bulking. * Optimized `TraverserRequirement` calculations. * `Step.processNextStart()` and `Step.next()` now return `Traverser.Admin<E>`. (*breaking*) * `Traversal.addTraverserRequirement()` method removed. (*breaking*) ``` UPDATE ``` Step API Update ^^^^^^^^^^^^^^^^ The `Step` interface is fundamental to Gremlin. `Step.processNextStart()` and `Step.next()` both retuned `Traverser<E>`. We had so many `Traverser.asAdmin()` and direct typecast calls throughout (especially in `TraversalVertexProgram`) that it was deemed prudent to have `Step.processNextStart()` and `Step.next()` return `Traverser.Admin<E>`. Moreover it makes sense as this is internal logic where admins are always needed. Providers with their own step definitions will simply need to change their method signatures (no logic update is required -- save that `asAdmin()` can be safely removed if used). This is a trivial update to make. Traversal API Update ^^^^^^^^^^^^^^^^^^^^ The way in which `TraverserRequirements` are calculated has been changed (for the better). The ramification is that post compilation requirement additions no longer make sense and should not be allowed. To enforce this, `Traversal.addTraverserRequirement()` method has been removed from the interface. Moreover, users should never be able to add requirements manually (this should all be inferred from the end compilation). However, if need be, there is always `RequirementStrategy` which will allow the user to add a requirement at strategy application time (though again, there should not be a reason to do so). ``` VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1206 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/266.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #266 ---- commit 5b4009d6418e818066d72c474123df1fbc081478 Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-16T02:49:55Z I've redesigned how TraverserRequirements are accessed. This GREATLY reduces the amount of logic required for all by()-modulators, StartSteps, GraphSteps, and barriers. I'm talking CRAZY more efficient when by()-modulators are used. I've made it so the Step API works with Traverser.Admin. We had too many .asAdmins() everywhere and typecasting everywhere -- no bueno. So much more clean now. For providers of Steps, breaking -- but its simply just changing your processNextStart() signature to return a Traverser.Admin. The amount of logic removed in this push is insane. This covers TINKERPOP-1206 and partially TINKERPOP-1014. The model I used for getTraverserGenerator() is really smart and we might want to use it for getSideEffects() and getStrategies(). In short, lazy computation of these data structures as they are rarely globally needed. This also means we can pretty signifanctly reduce the size of a Traversal serialization which is good for RemoteGraph. Finally found a major bug in GryoMapper -- registered XXXGeneartor, not XXX (for one species). Sucky. commit 90873e3cd640f45fd8872a0c19406bdfd1f3dc2a Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-16T13:13:32Z make the requirements unmodifiable for safety. ---- > ExpandableIterator can take a full TraverserSet at once -- Barriers. > -------------------------------------------------------------------- > > Key: TINKERPOP-1206 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1206 > Project: TinkerPop > Issue Type: Improvement > Components: process > Affects Versions: 3.1.1-incubating > Reporter: Marko A. Rodriguez > > I haven't looked at {{ExpandableIterator}} in over a year. Its one of the > most fundamental structures of a Gremlin traversal. I just realized it can > take an entire {{TraverserSet}}. As such, if the previous step is a > {{Barrier}}, don't iterate the barrier out, simply "dump it" into the current > steps {{ExpandableIterator}}. That should speed up things significantly -- > though there are not that many barrier steps... but still. -- This message was sent by Atlassian JIRA (v6.3.4#6332)