Hi, TinkerPop 3.2.0 boasts the ability to have multiple OLAP/OLTP jobs within a single Traversal instance. The following tickets have been closed and merged to master/ (TinkerPop 3.2.0-SNAPSHOT).
https://issues.apache.org/jira/browse/TINKERPOP-1140 https://issues.apache.org/jira/browse/TINKERPOP-971 https://issues.apache.org/jira/browse/TINKERPOP-962 Instead of going through the gnarly details, I will explain with a simple example: gremlin> g = TinkerFactory.createModern().traversal().withComputer() ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], tinkergraphcomputer] gremlin> g.V().pageRank().order().by(PAGE_RANK).valueMap() ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], name:[peter], age:[35]] ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], name:[marko], age:[29]] ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], name:[josh], age:[32]] ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], name:[vadas], age:[27]] ==>[gremlin.pageRankVertexProgram.pageRank:[0.23181250000000003], name:[ripple], lang:[java]] ==>[gremlin.pageRankVertexProgram.pageRank:[0.4018125], name:[lop], lang:[java]] gremlin> gremlin> g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().toString() ==>[GraphStep([],vertex), PageRankVertexProgramStep([VertexStep(OUT,edge)]), OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))]), PropertyMapStep(value)] gremlin> gremlin> g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().iterate().toString() ==>[PageRankVertexProgramStep([VertexStep(OUT,edge)]), TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]), ComputerResultStep, PropertyMapStep(value)] As you can see from the compilation we have 2 OLAP jobs and one OLTP job in a single Traversal! TraversalVertexProgramStep was always what an OLAP Gremlin traversal was, but now its a step in and of itself just like any other step in on the Gremlin machine. OLAP [PageRankVertexProgramStep([VertexStep(OUT,edge)]), OLAP TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]), OLTP ComputerResultStep, PropertyMapStep(value)] There is still more work to be done/tweaked in the area for 3.2.0. This is what this email [DISCUSS] is for. There are some decisions we can make and I would like people's thoughts on the matter before we make them: ------------------------------------------------------------------------ 1. Parameterizing OLAP steps. What about the following parameterization below: g.V().pageRank(0.85).times(20).by(outE('knows')).by('page.rank') This says I want a PageRankVertexProgram executed with an alpha parameter at 0.85, to iterate 20 times, using "knows" edges for the energy diffusion, and the property to save the result to on the vertex being "page.rank." As you may know "times" and "by" are step-modulators. However, when by(string) is used, it currently compiles to by(values(string).limit(1)). If we go down this road of adding more VertexPrograms, more by-modulations, I think we need to make a new interface called ByModulating that allows the step to decide (not the Traversal) what by(traversal), by(string), by(number), by(function), etc. mean to it. Likewise, we would need TimesModulating where RepeatStep and PageRankVertexProgramStep will do two different things with that information. Thoughts? 2. Traversers as source points to OLAP steps. Imagine the following traversal: g.V().has('name','UCSC').in('attended').pageRank().by(bothE('worksWith')) What this means to me is the initial pageRank energy will start at all people who attended the University of California at Santa Cruz and then diffuse by worksWith-edges. You may say -- "but Marko, regardless of the initial distribution, PageRank always converges to a stable state distribution." To which I say, "now include times(3)"-- that is, only iterate the energy 3-steps. Now you have biased-PageRank also (kinda sorta) like PageRank-priors. This is great for recommendation engines as you aren't identifying a global energy distribution, but a local one (rooted at the energy source). Do people like the idea of PageRank being biased by the initial traverser set? Moreover, imagine someone attended UCSC twice (lets say for this example), double the energy for them?! There are other OLAP algorithms that can leverage this -- think PeerPressureVertexProgram (more VOTE_WEIGHT by the sources). What are people's thoughts on the matter and what ideas do you have to make Traversal-OLAP all the better in TinkerPop 3.2.0. Thanks everyone, Marko. http://markorodriguez.com