Hi, >> We should deprecate the MapReduce API and simply use Memory. > > You mentioned that these changes were for TinkerPop 3.3.x. We've done a > really good job imo of controlling breaking changes. For the eventual 3.3.x > line (don't know when we want to consider starting that), I think we should > make that our opportunity to remove a lot of the stuff we deprecated over > the post-GA releases. I suppose that should be a separate thread of > discussion, but I mention it because if that ends up being our intent and > we have further intent to deprecate things like the MapReduce API, we > should look to deprecate those things in the 3.2.x/3.1.x lines now so that > we can go into 3.3.x without any new @deprecated stuff (we would just have > our breaking changes start there with no dead code lying around). Should we > shape our 3.3.x strategy around that approach?
We will alway have @Deprecations coming in. I don't think we will be able to say "TinkerPop 3.3.x is clean and clear of all deprecations." Next, the MapReduce API is pretty core and to just deprecate it now in a mid-major line, I believe, is bit extreme. For 3.3.x, I want to be able to show (in the docs) how to do the MapReduce stuff via Memory and explain such things more in-depth. Marko. http://markorodriguez.com > > > On Tue, May 24, 2016 at 3:35 PM, Marko Rodriguez <[email protected]> > wrote: > >> Hello, >> >> For TinkerPop 3.3.0, I think we should clean some things up in >> GraphComputer. This desire was started by Kuppitz using program() in >> complex ways and realizing awkwardnesses that I believe we should fix. In >> particular: >> >> https://issues.apache.org/jira/browse/TINKERPOP-1309 >> https://issues.apache.org/jira/browse/TINKERPOP-1306 >> >> How do we do this? >> >> 1. Configuration and Memory should always play together to ensure >> that job chaining works. >> * memory = new ConfigurationMemory(configuration) >> * memory.store(configuration) >> 2. In Hadoop, Memory should be persisted as such: >> * hdfs.ls("output") >> ==>graph >> ==>memory >> * This then perfectly reflects the ComputerResult return >> which is basically a Pair<Graph,Memory> >> * This means, while we are breaking things, ~g as the >> directory should go away, it should just be graph. (this is historic from >> when g was graph). >> * We should provide ComputerResult result = >> Storage.result("output"). >> 3. We should deprecate the MapReduce API and simply use Memory. >> * TraversalVertexProgram (arguably the most complex >> VertexProgram) no longer uses the MapReduce API, all reductions are via >> Memory. >> * Everything will then just be Graph (distributed/workers) >> or Memory (local/master). >> >> This will help us to clean up a bunch of ambiguity in the API. Everything >> in OLAP is just about Graph, Memory, and VertexProgram. VertexPrograms are >> able to access previous Memory representations via Configuration (for OLAP >> chains). This makes sense since VertexProgram.load() takes two arguments >> --- Graph and Configuration, where Memory is a subset of the properties in >> Configuration. The MapReduce API was added to allow people to post process >> their graph after a VertexProgram had executed. However, Memory is more >> powerful -- it can be modulated at each iteration and it can broadcast >> results throughout the cluster. The drawback, Memory can only be used for >> data that can be stored on a single machine -- counters, reductions, etc. >> While MapReduce stored results in a distributed manner, the drawback, there >> was no way to broadcast results and thus, it was only useful after a >> computation, not during. I think we make a hard break and get rid of >> MapReduce. >> >> Finally, while VertexPrograms have nothing to do with Traversals, >> Traversals are fundamental to TinkerPop. With that, I think we should have >> helpers like: >> >> traversal = ConfigurationTraversal.load(configuration) >> ConfigurationTraversal.store(configuration, traversal) >> ConfigurationTraversal.synchronizeSideEffectsAndMemory(traversal, >> memory) >> >> This way Traversals are populated through Configurations (as they >> currently are), but we make it easy for people to get the traversal, >> configure the memory that will be used for traversal sideEffects, etc. >> >> In short, with some cleanup and thought, we should be able to make it >> easier for people to write complex VertexProgram chains without a lot of >> nasty serialization/configuration boilerplate. >> >> Thoughts?, >> Marko. >> >> http://markorodriguez.com >> >>
