> We should deprecate the MapReduce API and simply use Memory.

You mentioned that these changes were for TinkerPop 3.3.x. We've done a
really good job imo of controlling breaking changes. For the eventual 3.3.x
line (don't know when we want to consider starting that), I think we should
make that our opportunity to remove a lot of the stuff we deprecated over
the post-GA releases. I suppose that should be a separate thread of
discussion, but I mention it because if that ends up being our intent and
we have further intent to deprecate things like the MapReduce API, we
should look to deprecate those things in the 3.2.x/3.1.x lines now so that
we can go into 3.3.x without any new @deprecated stuff (we would just have
our breaking changes start there with no dead code lying around). Should we
shape our 3.3.x strategy around that approach?



On Tue, May 24, 2016 at 3:35 PM, Marko Rodriguez <[email protected]>
wrote:

> Hello,
>
> For TinkerPop 3.3.0, I think we should clean some things up in
> GraphComputer. This desire was started by Kuppitz using program() in
> complex ways and realizing awkwardnesses that I believe we should fix. In
> particular:
>
>         https://issues.apache.org/jira/browse/TINKERPOP-1309
>         https://issues.apache.org/jira/browse/TINKERPOP-1306
>
> How do we do this?
>
>         1. Configuration and Memory should always play together to ensure
> that job chaining works.
>                 * memory = new ConfigurationMemory(configuration)
>                 * memory.store(configuration)
>         2. In Hadoop, Memory should be persisted as such:
>                 * hdfs.ls("output")
>                     ==>graph
>                     ==>memory
>                 * This then perfectly reflects the ComputerResult return
> which is basically a Pair<Graph,Memory>
>                 * This means, while we are breaking things, ~g as the
> directory should go away, it should just be graph. (this is historic from
> when g was graph).
>                 * We should provide ComputerResult result =
> Storage.result("output").
>         3. We should deprecate the MapReduce API and simply use Memory.
>                 * TraversalVertexProgram (arguably the most complex
> VertexProgram) no longer uses the MapReduce API, all reductions are via
> Memory.
>                 * Everything will then just be Graph (distributed/workers)
> or Memory (local/master).
>
> This will help us to clean up a bunch of ambiguity in the API. Everything
> in OLAP is just about Graph, Memory, and VertexProgram. VertexPrograms are
> able to access previous Memory representations via Configuration (for OLAP
> chains). This makes sense since VertexProgram.load() takes two arguments
> --- Graph and Configuration, where Memory is a subset of the properties in
> Configuration. The MapReduce API was added to allow people to post process
> their graph after a VertexProgram had executed. However, Memory is more
> powerful -- it can be modulated at each iteration and it can broadcast
> results throughout the cluster. The drawback, Memory can only be used for
> data that can be stored on a single machine -- counters, reductions, etc.
> While MapReduce stored results in a distributed manner, the drawback, there
> was no way to broadcast results and thus, it was only useful after a
> computation, not during. I think we make a hard break and get rid of
> MapReduce.
>
> Finally, while VertexPrograms have nothing to do with Traversals,
> Traversals are fundamental to TinkerPop. With that, I think we should have
> helpers like:
>
>         traversal = ConfigurationTraversal.load(configuration)
>         ConfigurationTraversal.store(configuration, traversal)
>         ConfigurationTraversal.synchronizeSideEffectsAndMemory(traversal,
> memory)
>
> This way Traversals are populated through Configurations (as they
> currently are), but we make it easy for people to get the traversal,
> configure the memory that will be used for traversal sideEffects, etc.
>
> In short, with some cleanup and thought, we should be able to make it
> easier for people to write complex VertexProgram chains without a lot of
> nasty serialization/configuration boilerplate.
>
> Thoughts?,
> Marko.
>
> http://markorodriguez.com
>
>

Reply via email to