[ 
https://issues.apache.org/jira/browse/TINKERPOP3-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023054#comment-15023054
 ] 

Matthias Broecheler commented on TINKERPOP3-845:
------------------------------------------------

[~okram] For instance, Shark (which is SQL on Spark) allows you to define 
tables with particular caching options as explained in the documentation 
(https://github.com/amplab/shark/wiki/Shark-User-Guide):
{{CREATE TABLE ... TBLPROPERTIES ("shark.cache" = "true") AS SELECT ...}}

The part {{("shark.cache" = "true")}} is not SQL but a custom extension of 
Shark to allow the user to control the caching behavior of RDDs.
Something like this would also be useful in the context of Gremlin on top of 
Spark. If you know that you need to run a lot of analytics on the friendship 
graph, it would make sense to cache the RDD that is the friendship part of the 
graph and then reuse that across analyses.

> Spark Execution Options
> -----------------------
>
>                 Key: TINKERPOP3-845
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-845
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.0.2-incubating
>            Reporter: Matthias Broecheler
>
> In Spark, the user has some control over how the Spark job should be 
> executed. In particular, the user can control if certain RDDs should be 
> cached. In SparkSQL this is exposed via custom SQLish commands to the user. 
> We should investigate if something similar can be done in Gremlin to give the 
> user more control over their job execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to