Re: [DISCUSS] How do you want your OLAP?

Marko Rodriguez Thu, 18 Feb 2016 10:59:17 -0800

Hi everyone,

In branch TINKERPOP-1154, we have the following:


        https://gist.github.com/okram/6b58cb63e8b70668c7c0
        https://gist.github.com/okram/6124a68978b81abfcdd4
        https://gist.github.com/okram/eea54646bed4551d6f4d

As you can see, given my 1/2/3 list from this DISCUSS, we have implemented all 
1, 2, and 3!

Once this gets VOTE'd in, I will publish the SNAPSHOT docs for everyone.

Enjoy,
Marko.

http://markorodriguez.com

On Feb 13, 2016, at 9:49 AM, Marko Rodriguez <okramma...@gmail.com> wrote:

> Dar. I knew there was a #3. You can never list things with bold itemizations 
> and NOT have a #3.
> 
> 3. Traversers as sinks to subsequent steps.
> 
> This falls from (2) and can be summed up as such:
> 
> g.V().has('name','UCSC').in('attended').
>   pageRank().by(bothE('worksWith')).times(3).
>   order().by('pageRank').limit(10).values('name')
> 
> Question: What are the vertices being ordered and limited whose names are 
> returned? Algorithmically, before executing PageRankVertexProgramStep, there 
> will be traversers at the UCSC attendees (HALTED_TRAVERSERS). Then PageRank 
> executes. After that, all vertices "worksWith"-edges connected to the UCSC 
> attendee sources will have PageRank values.  What is ultimately emitted -- we 
> have two choices:
> 
>       1. The top 10 PageRank'd UCSC attendees within the "worksWith" subgraph?
>               - revive the HALTED_TRAVERSERS from the previous 
> TraversalVertexProgramStep execution.
>       2. The top 10 PageRank'd vertices within the "worksWith" subgraph?
>               - put traversers only at the vertices with energy from the 
> PageRankVertexProgramStep.
> 
> I sort of like the #1 because you can also make #2 work by doing:
> 
> g.V().has('name','UCSC').in('attended').
>   pageRank().by(bothE('worksWith')).times(3).
>   V().order().by('pageRank').limit(10).values('name') 
> 
> Notice that for all the UCSC attendees leaving PageRankVertexProgramStep, we 
> simply jump back to the global set of vertices and order everyone by their 
> pageRank.
> 
> In essence, are the traversers that enter a XXXVertexProgramStep the 
> traversers that exit a XXXVertexProgramStep. For TraversalVertexProgramStep 
> its like that. For PageRankVertexProgram step, it can make sense and actually 
> yield interesting ramifications such as:
> 
> g.V().has('name','UCSC').in('attended').
>   pageRank().by(bothE('worksWith')).times(3).
>   V().has('title','Ph.D.').order().by('pageRank').limit(10).values('name')    
> 
> I think a decision that is consistent is important so that people always know 
> the traversers going in are X and the result is always like Y. If we get too 
> crazy with each XXXVertexProgramStep doing different things with chained OLAP 
> steps, it might lead to language confusions.
> 
> Decisions, decisions, decisions….
> 
> Marko.
> 
> http://markorodriguez.com
> 
> On Feb 13, 2016, at 9:25 AM, Marko Rodriguez <okramma...@gmail.com> wrote:
> 
>> Hi,
>> 
>> TinkerPop 3.2.0 boasts the ability to have multiple OLAP/OLTP jobs within a 
>> single Traversal instance. The following tickets have been closed and merged 
>> to master/ (TinkerPop 3.2.0-SNAPSHOT).
>> 
>>      https://issues.apache.org/jira/browse/TINKERPOP-1140
>>      https://issues.apache.org/jira/browse/TINKERPOP-971
>>      https://issues.apache.org/jira/browse/TINKERPOP-962
>> 
>> Instead of going through the gnarly details, I will explain with a simple 
>> example:
>> 
>> gremlin> g = TinkerFactory.createModern().traversal().withComputer()
>> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], tinkergraphcomputer]
>> gremlin> g.V().pageRank().order().by(PAGE_RANK).valueMap()
>> ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], 
>> name:[peter], age:[35]]
>> ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], 
>> name:[marko], age:[29]]
>> ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], 
>> name:[josh], age:[32]]
>> ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], 
>> name:[vadas], age:[27]]
>> ==>[gremlin.pageRankVertexProgram.pageRank:[0.23181250000000003], 
>> name:[ripple], lang:[java]]
>> ==>[gremlin.pageRankVertexProgram.pageRank:[0.4018125], name:[lop], 
>> lang:[java]]
>> gremlin>
>> gremlin> 
>> g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().toString()
>> ==>[GraphStep([],vertex), PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
>> OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))]), 
>> PropertyMapStep(value)]
>> gremlin>
>> gremlin> 
>> g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().iterate().toString()
>> ==>[PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
>> TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
>>  ComputerResultStep, PropertyMapStep(value)]
>> 
>> As you can see from the compilation we have 2 OLAP jobs and one OLTP job in 
>> a single Traversal! TraversalVertexProgramStep was always what an OLAP 
>> Gremlin traversal was, but now its a step in and of itself just like any 
>> other step in on the Gremlin machine.
>> 
>> OLAP [PageRankVertexProgramStep([VertexStep(OUT,edge)]), 
>> OLAP  
>> TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
>>  
>> OLTP  ComputerResultStep, PropertyMapStep(value)]
>> 
>> There is still more work to be done/tweaked in the area for 3.2.0. This is 
>> what this email [DISCUSS] is for. There are some decisions we can make and I 
>> would like people's thoughts on the matter before we make them:
>> 
>> ------------------------------------------------------------------------
>> 
>> 1. Parameterizing OLAP steps.
>> 
>> What about the following parameterization below:
>> 
>> g.V().pageRank(0.85).times(20).by(outE('knows')).by('page.rank')
>> 
>> This says I want a PageRankVertexProgram executed with an alpha parameter at 
>> 0.85, to iterate 20 times, using "knows" edges for the energy diffusion, and 
>> the property to save the result to on the vertex being "page.rank." As you 
>> may know "times" and "by" are step-modulators. However, when by(string) is 
>> used, it currently compiles to by(values(string).limit(1)). If we go down 
>> this road of adding more VertexPrograms, more by-modulations, I think we 
>> need to make a new interface called ByModulating that allows the step to 
>> decide (not the Traversal) what by(traversal), by(string), by(number), 
>> by(function), etc. mean to it. Likewise, we would need TimesModulating where 
>> RepeatStep and PageRankVertexProgramStep will do two different things with 
>> that information. Thoughts?
>> 
>> 2. Traversers as source points to OLAP steps.
>> 
>> Imagine the following traversal:
>> 
>> g.V().has('name','UCSC').in('attended').pageRank().by(bothE('worksWith'))
>> 
>> What this means to me is the initial pageRank energy will start at all 
>> people who attended the University of California at Santa Cruz and then 
>> diffuse by worksWith-edges. You may say -- "but Marko, regardless of the 
>> initial distribution, PageRank always converges to a stable state 
>> distribution." To which I say, "now include times(3)"-- that is, only 
>> iterate the energy 3-steps. Now you have biased-PageRank also (kinda sorta) 
>> like PageRank-priors. This is great for recommendation engines as you aren't 
>> identifying a global energy distribution, but a local one (rooted at the 
>> energy source). Do people like the idea of PageRank being biased by the 
>> initial traverser set? Moreover, imagine someone attended UCSC twice (lets 
>> say for this example), double the energy for them?! There are other OLAP 
>> algorithms that can leverage this -- think PeerPressureVertexProgram (more 
>> VOTE_WEIGHT by the sources).
>> 
>> What are people's thoughts on the matter and what ideas do you have to make 
>> Traversal-OLAP all the better in TinkerPop 3.2.0.
>> 
>> Thanks everyone,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>

Re: [DISCUSS] How do you want your OLAP?

Reply via email to