[ 
https://issues.apache.org/jira/browse/TINKERPOP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179306#comment-15179306
 ] 

ASF GitHub Bot commented on TINKERPOP-1192:
-------------------------------------------

GitHub user okram opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/251

    TINKERPOP-1192 & TINKERPOP-1193 & TINKERPOP-951 & TINKERPOP-958: The future 
is now.

    https://issues.apache.org/jira/browse/TINKERPOP-1192
    https://issues.apache.org/jira/browse/TINKERPOP-1193
    https://issues.apache.org/jira/browse/TINKERPOP-951
    https://issues.apache.org/jira/browse/TINKERPOP-958
    
    Summary: `TraversalSideEffects` and `GraphComputer` `Memory` are aligned. 
The are unified via `MemoryTraversalSideEffects` which gives us the flexibility 
to have a single `TraversalSideEffects` across an arbitrary number of OLAP jobs 
and finally makes it so that OLTP and OLAP use of sideEffects are consistent. 
This comes at the cost of a few API changes to `TraversalSideEffects`  and some 
semantics changes. For 99% of users, they won't notice. Only those that make 
heavy use of `sideEffect{x = ...}` may have to change their usage.
    
    ```
    g.V().out("created").
      group("m").by(label).
      pageRank(1.0).
        by("pageRank").
        by(inE()).
        times(1).
      in("created").
      group("m").by("pageRank").
        cap("m")
    ```
    
    The result is:
    
    [{2.0=[v[4], v[4], v[4], v[4]], 1.0=[v[6], v[6], v[6], v[1], v[1], v[1]], 
software=[v[5], v[3], v[3], v[3]]}]
                
    Do you see why this traversal is so cool? Check out what it is doing ---
    
        1. Normal out("created").group(). Fine and dandy all the software 
vertices are grouped.
        2. But then we kick off another OLAP job that computes a single-step 
spreading activation starting at the software vertices.
        3. Then, from the software vertices, it goes to the people who created 
that software and groups those people by their pageRank value.
    
    First, (already known), this traversal compiles to 3 OLAP jobs and it "just 
works." Slammin'.
    Second, as of this afternoon, TraversalSideEffects (GraphComputer.Memory) 
are preserved across OLAP jobs and across TraversalVertexPrograms.
        --- it truly is just one traversal.
    
    ------
    
    CHANGELOG
    
    ```
    * Fixed a severe `Traversal` cloning issue that caused inconsistent 
`TraversalSideEffects`.
    * `TraversalSideEffects` remain consistent and useable across multiple 
chained OLAP jobs.
    * Added `MemoryTraversalSideEffects` which backs a `TraversalSideEffects` 
by `Memory` in OLAP.
    * `TraversalSideEffects` are now fully functional in OLAP save that an 
accurate global view is possible after an iteration.
    * Updated the `TraversalSideEffects` API to support registered reducers and 
updated `get()`-semantics. (*breaking*)
    * Split existing `profile()` into `ProfileStep` and 
`ProfileSideEffectStep`. 
    * The `profile()`-step acts like a reducing barrier and emits 
`TraversalMetrics` without the need for `cap()`. (*breaking*)
    * Added `LocalBarrier` to allow traversers to remain distributed during an 
iteration so as to limit cluster traffic.
    * An OLAP-based `Barrier` synchronization bug has been fixed.
    ```
    
    UPDATE
    
    ```
    
    TraversalSideEffect Update
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    There were significant changes to `TraversalSideEffect` both at the 
semantic level and at the API level. Users that have traversals of the form 
`sideEffect{…}` that use global side-effects should read the following 
carefully. If the user only uses side-effects via non-lambda steps (e.g. 
`groupCount("m")`) and with `g.withSideEffects()`, then the changes below will 
not effect them. Providers will not be effected by the changes save any tests 
cases that use `Traversal.getSideEffects().get()`. This method no longer 
returns `Optional<V>`, but instead just the value `<V>`.
    
    TraversalSideEffect Get API Change
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    `TraversalSideEffects` can now logically operate within a distributed OLAP 
environment. In order to make this possible, it is necessary that each 
side-effect be registered with a reducing `BinaryOperator`. This binary 
operator will combine distributed updates into a single global side-effect at 
the master traversal. Many of the methods in `TraversalSideEffect` have been 
`Deprecated`, but they are backwards compatible save that 
`TraversalSideEffects.get()` no longer returns an `Optional`, but instead 
throws an `IllegalArgumentException`. While the `Optional` semantics could have 
remained, it was deemed best to directly return the side-effect value to reduce 
object creation costs and because all side-effects must be registered apriori 
and therefore there is never a reason why an unknown side-effect key would be 
used.
    
    TraversalSideEffect Registration Requirement
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    All `TraversalSideEffects` must be registered upfront. This is because, in 
OLAP, side-effects map to `Memory` compute keys and as such, must be declared 
prior to the execution of the `TraversalVertexProgram`. If a user's traversal 
creates a side-effect mid-traversal, it will fail. The traversal must use 
`GraphTraversalSource.withSideEffect()` to declare the side-effects it will use 
during its execution lifetime.
    
    TraversalSideEffect Add Requirement
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    In a distributed environment, a side-effect can not be mutated and be 
expected to exist in the mutated form at the final, aggregated, master 
traversal. For instance, if a side-effect "myCount" references a `Long`, the 
`Long` can not be updated directly via `sideEffects.set("myCount", 
sideEffects.get("myCount") + 1)`. Instead, it must rely on the reducer to do 
the merging and thus, the `Step` must do `sideEffect.add("mySet",1)` where the 
registered reducer is `Operator.sum`. Note that 
`Traverser.sideEffect(key,value)` will use `TraversalSideEffect.add()`. Thus, 
the below will increment "a". If no operator was provided, then the operator is 
assumed `Operator.assign` and the final result of "a" would be 1.
    
    `g.withSideEffect("a",0,sum).V().out().sideEffect(t -> t.sideEffect("a",1))`
    
    See 
link:https://issues.apache.org/jira/browse/TINKERPOP-1192[TINKERPOP-1192]
    
    ProfileStep Update and GraphTraversal API Change
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    The `profile()`-step has been refactored into 2 steps -- `ProfileStep` and 
`ProfileSideEffectStep`. Users who previously used the `profile()` in 
conjunction with `cap(TraversalMetrics.METRICS_KEY)` can now simply omit the 
cap step. Users who retrieved `TraversalMetrics` from the side-effects after 
iteration can still do so, but will need to specify a side-effect key when 
using the `profile()`. For example, `profile("myMetrics")`.
    
    See link:https://issues.apache.org/jira/browse/TINKERPOP-958[TINKERPOP-958]
    
    ```
    
    VOTE +1.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1192

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/251.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #251
    
----
commit 8e807eec73a72b4bbbde3156e6aa8b237d099b42
Author: rjbriody <bob.bri...@datastax.com>
Date:   2016-02-24T20:08:25Z

    TINKERPOP-958 split .profile() into ProfileSideEffectStep and ProfileStep 
(which is a MapStep).

commit 856bb564baad3a28754aa37a77803ab3d2f9b8a5
Author: rjbriody <bob.bri...@datastax.com>
Date:   2016-02-25T16:07:17Z

    Condense ProfileSideEffectTest into ProfileTest.

commit 7374ee45cd918f0770df3966476e98fac10794ae
Author: rjbriody <bob.bri...@datastax.com>
Date:   2016-02-26T14:52:20Z

    Fix computer issues for profile step.

commit 86941ee5c6ff37f2bc4f88ea520a530e9372ba79
Author: rjbriody <bob.bri...@datastax.com>
Date:   2016-03-01T13:35:45Z

    Update traversal docs for Profile step changes.

commit 06cdff86c21a9d2c20f528d9c620604dde1b6a07
Author: rjbriody <bob.bri...@datastax.com>
Date:   2016-03-01T13:39:34Z

    Update traversal docs for Profile step changes.

commit 6f6c9a5698373291c51e9b12af86a5dc3e09ff0e
Author: rjbriody <bob.bri...@datastax.com>
Date:   2016-03-01T13:45:38Z

    Revert "Update traversal docs for Profile step changes."
    
    This reverts commit 06cdff86c21a9d2c20f528d9c620604dde1b6a07.

commit ebb6143e9dffbc0980dcb1445a18088f3e7fd4fa
Author: rjbriody <bob.bri...@datastax.com>
Date:   2016-03-01T13:45:46Z

    Revert "Update traversal docs for Profile step changes."
    
    This reverts commit 86941ee5c6ff37f2bc4f88ea520a530e9372ba79.

commit 50c56563157337e71f74a940b9f6c17718860dd3
Author: rjbriody <bob.bri...@datastax.com>
Date:   2016-03-01T13:49:22Z

    Update traversal docs for Profile step changes.

commit 7013a9746cc04da32bb63fe0e6932a468c47ff0d
Author: rjbriody <bob.bri...@datastax.com>
Date:   2016-03-01T13:50:37Z

    merge master

commit 5f6ab602c2539f12c35c3be607f287dd689375af
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-02T21:23:20Z

    Created MemoryTraversalSideEffects which backs a TraversalSideEffects by 
GraphComputer.Memory. Everything works except AggregateStep and ProfileStep. 
Need more thinking. I found a fundamental bug in Step.clone() where SideEffects 
in the cloned traversal referenced the original traversal. It had to do with 
the fast the cloning of a step is like this -- step.clone(), 
step.setTraversal(traversal). As such, for all TraversalParent steps, I had to 
Override setTraversal(traversal) and it is there where 
TraversalParent.integrateChild(...) is called. However, once that was done, 
everything with MemoryTraversalSideEffects just worked (except Aggregate and 
Profile --- of course). This is a really sweet model. Removed 
VertexTraversalSideEffects which was the old model which stored sideEffects as 
traverser properties which were then processed and aggregated via MapReduce 
jobs. With MemoryTraversalSideEffects, the sideEffects are contained within the 
TraversalVertexProgram job --- its so slick (and way more efficient).

commit b63f62e7008614782bcd3c4ef5a8b4ea7e6eb830
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-02T23:30:54Z

    through hell and back. TraversalSideEffects are clean and proper. Added 
LocalBarrier interface which tells GraphComputer not to aggregate the 
traversers to the master traversal, but instead to keep them distributed across 
the cluster, but block -- that is, barrier. This is a way to force a 
sychronization without having to move data. Currently only AggregateStep 
implements LocalBarrier. However, other steps such as NoOpCollectingBarrierStep 
and SupplyingBarrierStep can use it. All test cases pass except for the 
ProfileTest stuff. Going to merge in the @rjbriody branch into this branch and 
get everything smooth and slick and then PR both from this branch.

commit 08780c4b8ceaa4231b4a8421fea064c0f8122f84
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-03T14:25:19Z

    Figured out why Groovy traversals were failing. I was overriding the 
SideEffects in ScriptTraversal. I needed to use the SideEffects from the parent 
traversal. Commented out the ProfileTests for now. Going to now try and merge 
@rjbriody work into this branch and then if all is gravy, PR both from here.

commit e4ea17395478132e7f78bd53a9921ef803ab6a96
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-03T14:28:44Z

    Merge branch 'TINKERPOP-958' of 
https://github.com/rjbriody/incubator-tinkerpop into TINKERPOP-1192
    
    Conflicts:
        
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/ProfileSideEffectStep.java
        
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/gryo/GryoMapper.java
        
gremlin-groovy-test/src/main/java/org/apache/tinkerpop/gremlin/process/GroovyProcessComputerSuite.java
        
gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java

commit ff7f7d2b74a42d7c9574a0ebbb8cc4a6897195f5
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-03T17:03:09Z

    added @rjbriody ProfileStep updates.

commit 5ec7dd063087dfd8c7cc1894bf86fa82a80a75ba
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-03T19:11:02Z

    Finalized the TravarsalSideEffect API work. There is one breaking change 
that was needed -- its small. However, the semantics of TraversalSideEffect now 
require all side-effects to be registered upfront via withSideEffect(). Users 
can no longer create a side-effect mid-traversal. This ensures that OLTP and 
OLAP behave identically. This occurs when people did something like --- 
g.V().sideEffect{it.sideEffect('a',new Set())}.out().out().... that sideEffect 
will need to be created at g.withSideEffect('a',Set::new).... Likewise. All 
other methods work but are deprecated and assume Operator.assign if now 
registered Reducer exists --- this will be okay for OLTP, but will lead to bad 
results in OLAP. In short, peoeople should really register reducers with their 
sideEffects -- g.V.withSideEffect(a,Set::new,Operator.addAll)......Added a 
pretty insan-o nested, sideEffect usage test to GroupCountTest -- OLTP and OLAP 
are happy.

commit ff390381b9896a16c73a056f167e84f2beb8f1d6
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-03T20:07:52Z

    added test to make sure declared SideEffect is ultimate returned class.

commit f778b5b0ab13c3860c30e39eadd64dcdd7af1c18
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-03T22:11:40Z

    Added some crazy tests to SideEffectTest which verify that registered 
operators and suppliers as well as mid-traversal side-effect calls behave as 
expected in both OLTP and OLAP. This required lots of tinkering with 
DefaultTraversalSideEffects to get the logic right around registeration. 
Removed a pointless TraversalSideEffectsTest we had. Doing all the complex 
logic in SideEffectTest.

commit 731789637d2b47306bed3a2f3008c9bf063a0f3a
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-03T23:58:00Z

    fixed a bug in DefaultTraversalSideEffects.mergedInto(). Added THE MOST 
INSANE test case to PageRankTest. SideEffects are preserved across 
GraphComputer jobs. Insane.

commit 02a551a5f544f8f3b2344631252c21c03a51f117
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-04T02:14:18Z

    added some nice test cases that ensure cloning and compiling respect 
TraversalSideEffect propagation. Added test cases that verify that the new 
TraversalSideEffect API changes are correctly implemented by 
DefaultTraversalSideEffect. All in all, just more tests cases to give us 
confidence.

commit 1c174ba5e5dfa025b41df541ba21b3637e17901e
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-03-04T02:31:34Z

    realized a very simple (and wildly efficient) optimization to 
TraversalVertexProgram around when to and not to return halted traversers to 
the master traversal. Okay. Done for the night.

----


> TraversalSideEffects should support registered reducers (binary operators).
> ---------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1192
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1192
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.1.1-incubating
>            Reporter: Marko A. Rodriguez
>             Fix For: 3.2.0-incubating
>
>
> TinkerPop 3.2.0-SNAPSHOT has made leaps and bounds towards completely 
> aligning Gremlin OLTP and Gremlin OLAP. What has got me really excited is 
> that there is such a strong conceptual alignment between the following 
> components:
> {code}
>       VertexProgram <=> Traversal
>       Iteration <=> Step
>       Messages <=> Traversers
>       MessageCombiner <=> TraverserSet ("bulking")
>       BSP <=> Barrier
>       Workers <=> Parallel Steps
>       Master <=> Sequential Steps
>       Memory <=> SideEffects
> {code}
> TraversalVertexProgram is very clean -- its lays atop the GraphComputer API 
> in a natural, effortless way.
> However, there is one last pairing that needs some better alignment: 
> GraphComputer Memory and Traversal SideEffects. A Memory slot has the notion 
> of a key, a value, and a reducer (binary operator). A Traversal SideEffect as 
> the notion of a key and a value. I think we should enable Traversal 
> SideEffects to support registered reducers. If we do this, then there is 
> perfect alignment between the two models and we won't have to have 
> "if(onGraphComputer)"-type logic in our side-effect steps.
> Right now in GroupCountSideEffectStep we do this:
> {code}
> public void sideEffect(final Traverser<S> traverser) {
>   Map<E,Long> groupCountMap = 
> this.getTraversal().getSideEffects().get(this.sideEffectKey);
>   MapHelper.incr(traverser.get(), traverser.bulk(), groupCountMap)
> }
> {code}
> We are explicitly getting the Map from the sideEffects and updating it. This 
> model will not generally work in OLAP because groupCountMap is a distributed 
> data structure and thus, local updates to a Map don't distribute. I have it 
> working currently in master/, but at the cost of not being able to read the 
> sideEffect, only write to it. To make TraversalSideEffects consistent across 
> both OLTP and OLAP, I think we should express GroupCountSideEffectStep like 
> this (*** analogously for GroupSideEffectStep, TreeSideEffectStep, etc.):
> {code}
> public void sideEffect(final Traverser<S> traverser) {
>   this.getTraversal().getSideEffects().add(this.sideEffectKey, 
> Collections.singletonMap(traverser.get(), traverser.bulk());
> }
> {code}
> Moreover, TraversalSideEffects should have the following method:
> {code}
>       TraversalSideEffects.register(final String key, final Supplier<A> 
> initialValue, final BinaryOperator<A> reducer)
> {code}
> Note that we already have:
>       
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSideEffects.java#L88
> We can deprecate the current registerSupplier() in support of register(). 
> Moreover, for backwards compatibility, BinaryOperator<A> reducer would simply 
> be Operator.assign. 
>       
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Operator.java#L59-L62
> Thus, this would not be a breaking change and it will ensure a natural 
> congruence between these two related computing structures -- Memory and 
> TraversalSideEffects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to