GitHub user okram opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/227

    TINKERPOP-1140: TraversalVertexProgramStep in support of OLAP/OLTP 
conversions.

    https://issues.apache.org/jira/browse/TINKERPOP-1140
    
    `Traversals` can now be composed of multiple OLAP jobs. This ticket in 
particular introduced `TraversalVertexProgramStep` which is responsible for 
executing a OLAP job and then sending its results to the next OLAP step 
(implementing a `VertexComputing`) or, it just drains the results serially to 
the next OLTP step. To demonstrate this working, I added 
`PageRankVertexProgramStep` (experimental for now -- a future commit will make 
all of this more explicit and include `PeerPressureVertexProgramStep`. This 
ticket is particular to `TraversalVertexProgramStep`). Here is the glory.
    
    ```
    gremlin> g = TinkerFactory.createModern().traversal().withComputer()
    ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], 
tinkergraphcomputer]
    gremlin> 
g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap()
    ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], 
name:[peter], gremlin.pageRankVertexProgram.edgeCount:[1.0], age:[35]]
    ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], 
name:[marko], gremlin.pageRankVertexProgram.edgeCount:[3.0], age:[29]]
    ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], 
name:[josh], gremlin.pageRankVertexProgram.edgeCount:[2.0], age:[32]]
    ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], 
name:[vadas], gremlin.pageRankVertexProgram.edgeCount:[0.0], age:[27]]
    ==>[gremlin.pageRankVertexProgram.pageRank:[0.23181250000000003], 
name:[ripple], gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
    ==>[gremlin.pageRankVertexProgram.pageRank:[0.4018125], name:[lop], 
gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
    gremlin> 
g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().explain()
    ==>Traversal Explanation
    
========================================================================================================================================================================================================================================
    Original Traversal                 [GraphStep([],vertex), 
PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))]), 
PropertyMapStep(value)]
    
    ConnectiveStrategy           [D]   [GraphStep([],vertex), 
PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))]), 
PropertyMapStep(value)]
    VertexProgramStrategy        [D]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    IdentityRemovalStrategy      [O]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    FilterRankingStrategy        [O]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    IncidentToAdjacentStrategy   [O]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    AdjacentToIncidentStrategy   [O]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    MatchPredicateStrategy       [O]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    RangeByIsCountStrategy       [O]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    TinkerGraphStepStrategy      [P]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    ProfileStrategy              [F]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    StandardVerificationStrategy [V]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    ComputerVerificationStrategy [V]   
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    
    Final Traversal                    
[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    gremlin>
    gremlin> 
g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().toString()
    ==>[GraphStep([],vertex), 
PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))]), 
PropertyMapStep(value)]
    gremlin> 
g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().iterate().toString()
    ==>[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
 ComputerResultStep, PropertyMapStep(value)]
    ```
    
    CHANGELOG:
    
    ```
    * Fixed a bug in both `SparkGraphComputer` and `GiraphGraphComputer` 
regarding source data access in job chains.
    * Significantly extended job chaining test coverage for `GraphComputer` 
providers.
    * Added `TraversalHelper.onGraphComputer(traversal)`.
    * `MapReduce.map()` no longer has a default implementation. This method 
must be implemented.
    * `TraversalVertexProgram` can work without a `GraphStep` start.
    * Fixed a severe bug in `TraverserMapReduce.combine()`.
    * Added `PageRankVertexProgramStep` and `GraphTraversal.pageRank()`.
    * Simplified the comparator model in `OrderGlobalStep` and `OrderLocalStep`.
    ```
    
    The only note for the upgrade docs will be that `MapReduce.map()` no longer 
have a default implementation. However, if you didn't implement it, your 
`MapReduce` job would instantly fail so I don't think anyone has code like 
that. Oh, and `TraversalHelper.onGraphComputer()` will replace the "stub 
method" I had in the previous PR with `TraversalStrategies.onGraphComputer()`. 
Thus, minor nothings to upgrade docs.
    
    *** NOTE: I will add `pageRank()` to the reference doc in a future 3.2.0 
commit when I do another ticket that is more specific to 
`PageRankVertexProgramStep` and `PeerPressureVertexProgram`.
    
    VOTE +1. `mvn clean install` and integration tests -- (though Giraph is 
currently still running -- but I have isolated and verified the two tests I 
added to the `ProcessComputerSuite`).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1140

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #227
    
----
commit 64cd1b18df1170e96d61c0e68cc6495d32c7fa3e
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-09T22:29:31Z

    TraversalVertexProgramStep created. It works. Sorta --- need to get 
ComputerVerificationStrategy more aware. 95 percent of tests pass -- the ones 
that fail have to do with deep nested traverasls and 
ComputerVerificationStrategy not catching them. Anywho, once this is solid, the 
door is open for PageRankVertexProgramStep. Going to get crazy here soon. Also, 
ComputerResultStep now simply traversifies ComputerResults. Really elegant.

commit f2dbb0f2094bb7044e3c1f8317fc2153c9010b2f
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-10T18:11:35Z

    Have TraversalVertexProgramStep working fully. I had to Ignore tests in 
ExplainTest and ProfileTest as OLTP and OLAP now no longer compile to the same 
representation. This is both good and bad. Its good because that means that a 
Traversal is not globally OLAP or globally OLTP. It can have OLAP and OLTP 
sections. Its bad in that features like Profile and Explain need to be smart 
about mixed-engine traversals. I am still far from finished ... just want to 
push to save work.

commit e5c3d784e2db06febd4da1d079712336818fd0ca
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-10T18:59:04Z

    fixed a bug around Serialization of GraphComputer. Integration tests now 
pass. Cool.

commit 6cbd8a5da0b8dad4f675c586a809e661a902076b
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-10T19:36:20Z

    Merge branch 'TINKERPOP-971' into TINKERPOP-1140

commit 080863a5837d94417b9f9fc17d0476defab56e63
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-10T22:23:02Z

    okay. TraversalVertexProgramStep is just like any other step. Nothing 
'special'. Its a TraversalParent with one global child -- the traversal to be 
executed on the GraphComputer. I got rid of 
TraversalSideEffects.onGraphComputer() as again, a traversal can now have OLTP 
and OLAP components. In its place, we have 
TraversalHelper.onGraphComputer(traversal) which will tell you if a parent of 
the traversal is a TraversalVertexProgramStep. this model is really really nice 
as it all falls within the natural construts of our Traversal API -- no new 
interfaces needed, no weird two sets of strategies, no weird two sets of 
sideEffects... Its really clean.

commit 3afc1894a4805e82ad076535697add3baac91c6c
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-11T01:41:07Z

    Stubbed PageRankVertexProgramStep. Cleaned up and optimized a few tid bits 
here and there. Its going to get nasty tomorrow.

commit a4cf1514d95dfd873675f08e6fd5d08aea810992
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-11T19:39:30Z

    Okay. Here is is. g.V().pageRank().order.by(pageRank).name.limit(10). We 
know support chained OLAP jobs within a single Traversal. This is an amazing 
piece of code. There is still lots more to do, but the basic framework is 
working and is sound. Note that we were able to loosen up lots of 
ComputerVerificaitonStrategy restrictions. Moreover, over the next couple of 
pushes, we will loosen up even more. The OrderStep comparator stuff was a rats 
nest that I have since cleaned up ... sometimes it wants a traverser, sometimes 
an object...random. I can't believe we haven't seen more bugs cause of what we 
had.

commit e2e38e691ee20ba697c2e737e678376a10f87da2
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-11T19:46:28Z

    Merge branch 'master' into TINKERPOP-1140

commit 17bfdb82ce89e24295940ad1926ae95e17eea821
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-11T21:32:30Z

    Was really dumb about SparkMessenger and GiraphMessenger. No need to insert 
a start step. Weird. ComputerVerificationStrategy can now recognize when edge 
properties are being manipulated in MapReduce. Added 
TraveralHelper.getLastElementClass(traversal). Neato. Fixed up FileSystem 
access in SparkGraphComputer. I think we have a bug in 3.1.2 that makes 
chaining files bad. Weird we don't have a test for that. However, most people 
don't chain vertex programs so I suspect no one will notice. 3.2.0 have it down 
pat as you need solid chaining the new multi-OLAP compiler to work.

commit 66d86cdfbb00bf1eb74f067815181f90b40d6feb
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-11T21:53:00Z

    there is a bug in GiraphGraphComputer when chaining jobs together. damn. I 
have things a bit cleaner. committing what I have for now.

commit cd1a37946848d28ad33d464fedac1655cadd5320
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-12T18:57:05Z

    Found a couple of bugs around job chaining in all GraphComputer 
implementations. Wrote a solid GraphComputerTest to ensure job chaining behaves 
as expected. Still a few more things to handle -- the Traversal.pageRank() 
tests in Giraph don't pass and I'm not sure why.

commit 0a2243037792854c6ef9f834f41785e7a728ee7e
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-12T21:02:46Z

    Found a bug in TraverserMapReduce where the combiner should not just call 
reduce(). Only Giraph was able to expose it because it actually calls 
combine(). The test dataset is so small that Spark doesn't even kick it off. 
TinkerGraph, because its in memory, doesn't use combine. Wow. That was a two 
day bug. However, I have written so many more test cases. Things are really 
pretty. And that ends my week with g.V.pageRank().order().by().limit(10) 
working. Phew.

commit 491bef66112d75d42cde87ae000e63587e9b3bfe
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-12T22:15:18Z

    added another PageRankTest test case and made MapReduce.map() a non-default 
implementation so users know they have to implement that method.

commit d627f9c06cd24b9c12d0938e3ebff20a39ae33b9
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-02-12T22:31:12Z

    minor nothing.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to