[
https://issues.apache.org/jira/browse/TINKERPOP3-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983094#comment-14983094
]
ASF GitHub Bot commented on TINKERPOP3-925:
-------------------------------------------
Github user RussellSpitzer commented on a diff in the pull request:
https://github.com/apache/incubator-tinkerpop/pull/129#discussion_r43540641
--- Diff:
spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkGraphComputer.java
---
@@ -72,28 +73,40 @@ public SparkGraphComputer(final HadoopGraph
hadoopGraph) {
@Override
public GraphComputer workers(final int workers) {
super.workers(workers);
- if
(this.sparkConfiguration.getString("spark.master").startsWith("local")) {
- this.sparkConfiguration.setProperty("spark.master", "local[" +
this.workers + "]");
+ if
(this.sparkConfiguration.getString(SparkLauncher.SPARK_MASTER).startsWith("local"))
{
+
this.sparkConfiguration.setProperty(SparkLauncher.SPARK_MASTER, "local[" +
this.workers + "]");
}
return this;
}
@Override
- public GraphComputer config(final String key, final Object value) {
+ public GraphComputer configure(final String key, final Object value) {
this.sparkConfiguration.setProperty(key, value);
return this;
}
@Override
- public Future<ComputerResult> submit() {
+ protected void validateStatePriorToExecution() {
super.validateStatePriorToExecution();
+ if
(this.sparkConfiguration.containsKey(Constants.GREMLIN_SPARK_GRAPH_INPUT_RDD)
&&
this.sparkConfiguration.containsKey(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT))
--- End diff --
Line length is a little long here, may want to collapse it for readability
> Use persisted SparkContext to persist an RDD across Spark jobs.
> ---------------------------------------------------------------
>
> Key: TINKERPOP3-925
> URL: https://issues.apache.org/jira/browse/TINKERPOP3-925
> Project: TinkerPop 3
> Issue Type: Improvement
> Components: hadoop
> Affects Versions: 3.0.2-incubating
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
> Fix For: 3.1.0-incubating
>
>
> If a provider is using Spark, they are currently forced to have HDFS be used
> to store intermediate RDD data. However, if they plan on using that data in a
> {{GraphComputer}} "job chain," then they should be able to lookup a
> {{.cached()}} RDD by name.
> Create a {{inputGraphRDD.name}} and {{outputGraphRDD.name}} to make it so
> that the configuration references {{SparkContext.getPersitedRDDs()}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)