[ 
https://issues.apache.org/jira/browse/TINKERPOP-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089630#comment-15089630
 ] 

ASF GitHub Bot commented on TINKERPOP-1072:
-------------------------------------------

GitHub user okram opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/196

    TINKERPOP-1072: Allow the user to set persistence options using 
StorageLevel.valueOf()

    https://issues.apache.org/jira/browse/TINKERPOP-1072
    
    I always thought Spark had some configuration like `default.storageLevel` 
and then when a user did `cache()` it would do that default. I was wrong. 
`cache()` is always `MEMORY_ONLY`. I made it so you can specify the storage 
level for both persisted RDDs and runtime job RDDs and thus now (internally) 
use `persist(STORAGE_LEVEL)` where `MEMORY_ONLY` is the default.
    
    Test cases, docs, and Spark integration tests pass.
    
    VOTE +1.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1072

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #196
    
----
commit 4082a4a043b54c102f49f220b14e2644817e1222
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2016-01-08T18:05:08Z

    Allow the user to specify the persistence StorageLevel for both the 
computed job graph and any PersistedOutputRDD data. Updated docs, example conf, 
and added a test case that validates that persisted to SparkStorage is correct 
as the configuration changes.

----


> Allow the user to set persistence options using StorageLevel.valueOf()
> ----------------------------------------------------------------------
>
>                 Key: TINKERPOP-1072
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1072
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.1.0-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Marko A. Rodriguez
>             Fix For: 3.1.1-incubating
>
>
> I always thought there was a Spark option to say stuff like 
> {{default.persist=DISK_SER_1}}, but I can't seem to find it.
> If no such option exists, then we should add it to Spark-Gremlin. For 
> instance:
> {code}
> gremlin.spark.storageLevel=DISK_ONLY
> {code}
> See: 
> http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
> Then we would need to go through and where we have {{...cache()}} calls, they 
> need to be changed to 
> {{....persist(StorageLevel.valueOf(conf.get("gremlin.spark.storageLevel","MEMORY_ONLY")}}.
> The question then becomes, do we provide flexibility where the user can have 
> the program caching different from the persisted RDD caching :|.... Too many 
> configurations sucks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to