[jira] [Commented] (TINKERPOP3-925) Use persisted SparkContext to persist an RDD across Spark jobs.

ASF GitHub Bot (JIRA) Fri, 30 Oct 2015 12:06:05 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP3-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983094#comment-14983094
 ]


ASF GitHub Bot commented on TINKERPOP3-925:
-------------------------------------------

Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/129#discussion_r43540641
  
    --- Diff: 
spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkGraphComputer.java
 ---
    @@ -72,28 +73,40 @@ public SparkGraphComputer(final HadoopGraph 
hadoopGraph) {
         @Override
         public GraphComputer workers(final int workers) {
             super.workers(workers);
    -        if 
(this.sparkConfiguration.getString("spark.master").startsWith("local")) {
    -            this.sparkConfiguration.setProperty("spark.master", "local[" + 
this.workers + "]");
    +        if 
(this.sparkConfiguration.getString(SparkLauncher.SPARK_MASTER).startsWith("local"))
 {
    +            
this.sparkConfiguration.setProperty(SparkLauncher.SPARK_MASTER, "local[" + 
this.workers + "]");
             }
             return this;
         }
     
         @Override
    -    public GraphComputer config(final String key, final Object value) {
    +    public GraphComputer configure(final String key, final Object value) {
             this.sparkConfiguration.setProperty(key, value);
             return this;
         }
     
         @Override
    -    public Future<ComputerResult> submit() {
    +    protected void validateStatePriorToExecution() {
             super.validateStatePriorToExecution();
    +        if 
(this.sparkConfiguration.containsKey(Constants.GREMLIN_SPARK_GRAPH_INPUT_RDD) 
&& 
this.sparkConfiguration.containsKey(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT))
    --- End diff --
    
    Line length is a little long here, may want to collapse it for readability


> Use persisted SparkContext to persist an RDD across Spark jobs.
> ---------------------------------------------------------------
>
>                 Key: TINKERPOP3-925
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-925
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.0.2-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Marko A. Rodriguez
>             Fix For: 3.1.0-incubating
>
>
> If a provider is using Spark, they are currently forced to have HDFS be used 
> to store intermediate RDD data. However, if they plan on using that data in a 
> {{GraphComputer}} "job chain," then they should be able to lookup a 
> {{.cached()}} RDD by name. 
> Create a {{inputGraphRDD.name}} and {{outputGraphRDD.name}} to make it so 
> that the configuration references {{SparkContext.getPersitedRDDs()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TINKERPOP3-925) Use persisted SparkContext to persist an RDD across Spark jobs.

Reply via email to