[ 
https://issues.apache.org/jira/browse/TINKERPOP-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152717#comment-15152717
 ] 

Marko A. Rodriguez commented on TINKERPOP-1166:
-----------------------------------------------

Jotted this out in a notebook and this feels the right way to do this.

{code}
Memory.merge(String, Mergable)
{code}

{code}
public class SumMergeable<T extends Number> implements Mergeable<T> {
  public SumMerge<T> merge(final SumMerge<T> other);
  public T get();
}
{code}

Both {{CountGlobalStep}} and {{SumGlobalStep}} would use the same {{SumMerge}} 
class. However, the merge for {{CountGlobalStep}} is just {{traverser.bulk()}}. 
For {{SumGlobalStep}}, its {{traverser.get() * traverser.bulk()}}.

Next, we can start to slide this into {{GraphComputer}} and start to push out 
{{MapReduce}} (maybe).... Check it:

Lets say we have another interface called {{VertexMergeable}} that extends 
{{Mergeable}} and adds this method:

{code}
public Mergeable<T> initial(final Vertex vertex)
{code}

{code}
graph.compute(SparkGraphComputer).program(MyVertexProgram).merge(MyMergeable).merge(...).merge(...)
{code}

The {{GraphComputer.merge(VertexMergeable)}} simply gets its initial value by 
first processing the current Vertex. Thats it. At that point, this identical to 
{{MapReduce}} EXCEPT! that in {{MapReduce}} if you ONLY do a Map, with no 
Reduce, you still have output splits distributed across the cluster, where in 
this model, that would be VERY BAD to do without some filtering of some sort or 
else you will merge a massive list to a single machine.

This is all very simple to do and I believe is easier to grock than the 
{{MapReduce}}-extension we added because its all part of the {{VertexProgram}} 
execution and not some auxiliary appendage.

> Add Memory.reduce() as option to Memory implementations.
> --------------------------------------------------------
>
>                 Key: TINKERPOP-1166
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1166
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop, process, tinkergraph
>    Affects Versions: 3.1.2-incubating
>            Reporter: Marko A. Rodriguez
>
> Currently {{Memory}} supports {{incr}}, {{and}}, {{or}}, ... These are great 
> and what people will typically  use. However, we should also provide the 
> generalization which is simply {{Memory.reduce}}. In this situation, 
> {{incr}}, {{or}}, {{and}}, etc. are just specifications of {{Memory.reduce}}.
> How would it work?
> When memory is initialized in a {{VertexProgram}}, it would be like this:
> {code}
> memory.set("myReduction", new MyReducingFunction(0))
> {code}
> Then {{ReducingFunction}} would look like this:
> {code}
> public class ReducingFunction implements UnaryOperator<A> {
>   public A getInitialValue();
>   public A apply(A first, A second);
> }
> {code}
> Easy peasy. Note that both Spark and Giraph support such types of 
> function-based reduction in their respective "memory engines." 
> TinkerGraphComputer will, of course, be easy to add this functionality too.
> Why do this? For two reasons:
> 1. We get extra flexibility in {{Memory}}.
> 2. https://issues.apache.org/jira/browse/TINKERPOP-1164



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to