[ 
https://issues.apache.org/jira/browse/TINKERPOP3-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023105#comment-15023105
 ] 

Marko A. Rodriguez commented on TINKERPOP3-845:
-----------------------------------------------

We do use Spark caching and this is something that is particular to TinkerPop 
and should not be controlled by the user. For instance, we cache {{graphRDD}} 
and {{mapRDD}} (in MapReduce) as those are things that will be reused over and 
over. All the other RDD spawns are not something you would cache given that 
they are transient in nature (and not reused).

For your example of reusing the "friendship-graph" across analyses, that is 
simply about leveraging {{PersistedOutput/InputRDD}}. Again, the friendship 
{{graphRDD}} will be cached and, for across analyses, you just have a persisted 
context that you reference.

Anything else besides caching that this ticket pertains too?

> Spark Execution Options
> -----------------------
>
>                 Key: TINKERPOP3-845
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-845
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.0.2-incubating
>            Reporter: Matthias Broecheler
>
> In Spark, the user has some control over how the Spark job should be 
> executed. In particular, the user can control if certain RDDs should be 
> cached. In SparkSQL this is exposed via custom SQLish commands to the user. 
> We should investigate if something similar can be done in Gremlin to give the 
> user more control over their job execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to