Hi everyone, In branch TINKERPOP-1154, we have the following:
https://gist.github.com/okram/6b58cb63e8b70668c7c0 https://gist.github.com/okram/6124a68978b81abfcdd4 https://gist.github.com/okram/eea54646bed4551d6f4d As you can see, given my 1/2/3 list from this DISCUSS, we have implemented all 1, 2, and 3! Once this gets VOTE'd in, I will publish the SNAPSHOT docs for everyone. Enjoy, Marko. http://markorodriguez.com On Feb 13, 2016, at 9:49 AM, Marko Rodriguez <okramma...@gmail.com> wrote: > Dar. I knew there was a #3. You can never list things with bold itemizations > and NOT have a #3. > > 3. Traversers as sinks to subsequent steps. > > This falls from (2) and can be summed up as such: > > g.V().has('name','UCSC').in('attended'). > pageRank().by(bothE('worksWith')).times(3). > order().by('pageRank').limit(10).values('name') > > Question: What are the vertices being ordered and limited whose names are > returned? Algorithmically, before executing PageRankVertexProgramStep, there > will be traversers at the UCSC attendees (HALTED_TRAVERSERS). Then PageRank > executes. After that, all vertices "worksWith"-edges connected to the UCSC > attendee sources will have PageRank values. What is ultimately emitted -- we > have two choices: > > 1. The top 10 PageRank'd UCSC attendees within the "worksWith" subgraph? > - revive the HALTED_TRAVERSERS from the previous > TraversalVertexProgramStep execution. > 2. The top 10 PageRank'd vertices within the "worksWith" subgraph? > - put traversers only at the vertices with energy from the > PageRankVertexProgramStep. > > I sort of like the #1 because you can also make #2 work by doing: > > g.V().has('name','UCSC').in('attended'). > pageRank().by(bothE('worksWith')).times(3). > V().order().by('pageRank').limit(10).values('name') > > Notice that for all the UCSC attendees leaving PageRankVertexProgramStep, we > simply jump back to the global set of vertices and order everyone by their > pageRank. > > In essence, are the traversers that enter a XXXVertexProgramStep the > traversers that exit a XXXVertexProgramStep. For TraversalVertexProgramStep > its like that. For PageRankVertexProgram step, it can make sense and actually > yield interesting ramifications such as: > > g.V().has('name','UCSC').in('attended'). > pageRank().by(bothE('worksWith')).times(3). > V().has('title','Ph.D.').order().by('pageRank').limit(10).values('name') > > I think a decision that is consistent is important so that people always know > the traversers going in are X and the result is always like Y. If we get too > crazy with each XXXVertexProgramStep doing different things with chained OLAP > steps, it might lead to language confusions. > > Decisions, decisions, decisions…. > > Marko. > > http://markorodriguez.com > > On Feb 13, 2016, at 9:25 AM, Marko Rodriguez <okramma...@gmail.com> wrote: > >> Hi, >> >> TinkerPop 3.2.0 boasts the ability to have multiple OLAP/OLTP jobs within a >> single Traversal instance. The following tickets have been closed and merged >> to master/ (TinkerPop 3.2.0-SNAPSHOT). >> >> https://issues.apache.org/jira/browse/TINKERPOP-1140 >> https://issues.apache.org/jira/browse/TINKERPOP-971 >> https://issues.apache.org/jira/browse/TINKERPOP-962 >> >> Instead of going through the gnarly details, I will explain with a simple >> example: >> >> gremlin> g = TinkerFactory.createModern().traversal().withComputer() >> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], tinkergraphcomputer] >> gremlin> g.V().pageRank().order().by(PAGE_RANK).valueMap() >> ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], >> name:[peter], age:[35]] >> ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], >> name:[marko], age:[29]] >> ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], >> name:[josh], age:[32]] >> ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], >> name:[vadas], age:[27]] >> ==>[gremlin.pageRankVertexProgram.pageRank:[0.23181250000000003], >> name:[ripple], lang:[java]] >> ==>[gremlin.pageRankVertexProgram.pageRank:[0.4018125], name:[lop], >> lang:[java]] >> gremlin> >> gremlin> >> g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().toString() >> ==>[GraphStep([],vertex), PageRankVertexProgramStep([VertexStep(OUT,edge)]), >> OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))]), >> PropertyMapStep(value)] >> gremlin> >> gremlin> >> g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().iterate().toString() >> ==>[PageRankVertexProgramStep([VertexStep(OUT,edge)]), >> TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]), >> ComputerResultStep, PropertyMapStep(value)] >> >> As you can see from the compilation we have 2 OLAP jobs and one OLTP job in >> a single Traversal! TraversalVertexProgramStep was always what an OLAP >> Gremlin traversal was, but now its a step in and of itself just like any >> other step in on the Gremlin machine. >> >> OLAP [PageRankVertexProgramStep([VertexStep(OUT,edge)]), >> OLAP >> TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]), >> >> OLTP ComputerResultStep, PropertyMapStep(value)] >> >> There is still more work to be done/tweaked in the area for 3.2.0. This is >> what this email [DISCUSS] is for. There are some decisions we can make and I >> would like people's thoughts on the matter before we make them: >> >> ------------------------------------------------------------------------ >> >> 1. Parameterizing OLAP steps. >> >> What about the following parameterization below: >> >> g.V().pageRank(0.85).times(20).by(outE('knows')).by('page.rank') >> >> This says I want a PageRankVertexProgram executed with an alpha parameter at >> 0.85, to iterate 20 times, using "knows" edges for the energy diffusion, and >> the property to save the result to on the vertex being "page.rank." As you >> may know "times" and "by" are step-modulators. However, when by(string) is >> used, it currently compiles to by(values(string).limit(1)). If we go down >> this road of adding more VertexPrograms, more by-modulations, I think we >> need to make a new interface called ByModulating that allows the step to >> decide (not the Traversal) what by(traversal), by(string), by(number), >> by(function), etc. mean to it. Likewise, we would need TimesModulating where >> RepeatStep and PageRankVertexProgramStep will do two different things with >> that information. Thoughts? >> >> 2. Traversers as source points to OLAP steps. >> >> Imagine the following traversal: >> >> g.V().has('name','UCSC').in('attended').pageRank().by(bothE('worksWith')) >> >> What this means to me is the initial pageRank energy will start at all >> people who attended the University of California at Santa Cruz and then >> diffuse by worksWith-edges. You may say -- "but Marko, regardless of the >> initial distribution, PageRank always converges to a stable state >> distribution." To which I say, "now include times(3)"-- that is, only >> iterate the energy 3-steps. Now you have biased-PageRank also (kinda sorta) >> like PageRank-priors. This is great for recommendation engines as you aren't >> identifying a global energy distribution, but a local one (rooted at the >> energy source). Do people like the idea of PageRank being biased by the >> initial traverser set? Moreover, imagine someone attended UCSC twice (lets >> say for this example), double the energy for them?! There are other OLAP >> algorithms that can leverage this -- think PeerPressureVertexProgram (more >> VOTE_WEIGHT by the sources). >> >> What are people's thoughts on the matter and what ideas do you have to make >> Traversal-OLAP all the better in TinkerPop 3.2.0. >> >> Thanks everyone, >> Marko. >> >> http://markorodriguez.com >> >