GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/268
TINKERPOP-1082 & TINKERPOP-1222: Hadoop Configuration Updates https://issues.apache.org/jira/browse/TINKERPOP-1082 https://issues.apache.org/jira/browse/TINKERPOP-1222 We had a very confusing situation with `gremlin.hadoop.graphInputFormat` and `gremlin.spark.graphInputRDD`. Not only did it cause a mess of `[WARN]` messages it was awkward as users had to know that one overrode the other. To make this cleaner, I created a new configuration called `gremlin.hadoop.graphReader` and `gremlin.hadoop.graphWriter` that can either take an `XXXFormat` or an `XXXRDD`. Internally, Spark/Giraph/etc. know how to reason on what is what. Finally, added `gremlin.hadoop.defaultGraphComputer` where users can specify a default `GraphComputer` in their proprties file and if so, `graph.compute()` will no longer throw an exception saying to use `graph.compute(class)`. Both of these changes are backwards compatible where there backwards compatibility is tested via `SparkHadoopGraphProvider` where via a coin-flip, sometimes the old model is used and sometimes the new model is used. Finally, I forgot to add docs on `GraphFilter` and they have been added to this PR. CHANGELOG ``` * Added `gremlin.hadoop.defaultGraphComputer` so users can use `graph.compute()` with `HadoopGraph`. * Added `gremlin.hadoop.graphReader` and `gremlin.hadoop.graphWriter` which can handled `XXXFormats` and `XXXRDDs`. * Deprecated `gremlin.hadoop.graphInputFormat`, `gremlin.hadoop.graphOutputFormat`, `gremlin.spark.graphInputRDD`, and `gremlin.spark.graphOuputRDD`. ``` UPDATE ``` Hadoop Configurations ++++++++++++++++++ Note that `gremlin.hadoop.graphInputFormat`, `gremlin.hadoop.graphOutputFormat`, `gremlin.spark.graphInputRDD`, and `gremlin.spark.graphOuputRDD` have all been deprecated. Using them still works, but moving forward, users only need to leverage `gremlin.hadoop.graphReader` and `gremlin.hadoop.graphWriter`. An example properties file snippet is provided below. gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat gremlin.hadoop.jarsInDistributedCache=true gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1082 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #268 ---- commit 6411d0d4142770f93fb1a188d7e991ed1b4355f3 Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-16T22:01:37Z gremlin.hadoop.graphReader and gremlin.hadoop.graphWriter are the new configurations replacing gremlin.hadoop.graphInputFormat and spark.graphInputRDD. Now HadoopGraph can handle either RDD or XXXFormats. Cleaner configurations. Backwards compatible. The older keys just map to the new keys inside HadoopConfiguration. commit b7f617b383700390128fca53de48f60cda3211fe Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-16T22:26:22Z fixed up the conf/.properties to use graphReader/graphWriter. Found more areas where inputFormat/outputFormat was still being used. Tested Giraph and its passing completely now. Need a helper utility that converts any Reader/Writer into an InputFormat or OutputFormat automagically. commit 13561b81aa8287c696b8d79befce42f84792f793 Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-16T22:49:47Z ConfUtil does the dirty work of InputRDD or InputFormat conversion to an InputFormat. commit 5f53589b487ab918719315db6047233fb13971ae Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-17T14:42:57Z added gremlin.hadoop.defaultGraphComputer which allows users to specify in their properties file which GraphComputer to use by default. This allows providers that only support one Hadoop-based OLAP engine to 'hard set' the implementation so the syntax is cleaner -- graph.compute() vs. graph.compute(GiraphGraphComputer.class). This is backwards compatible. The SparkHadoopGraphProvider has been updated to sometimes use compute() and sometimes use compute(class). commit 4a130d9092bc37dac252536280d60158fe75f74c Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-17T15:09:16Z updated docs on GraphFilter and graphReader/graphWriter. commit 5a9f56d53741c985982d2bb13d3d8f31ffb6dd85 Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-17T15:32:04Z gremlin.hadoop.graphInputFormat.hasEdges is not gremlin.hadoop.graphReader.hasEdges. Likewise for graphOuputFormat. Backwards compatible. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---