[ https://issues.apache.org/jira/browse/TINKERPOP-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089630#comment-15089630 ]
ASF GitHub Bot commented on TINKERPOP-1072: ------------------------------------------- GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/196 TINKERPOP-1072: Allow the user to set persistence options using StorageLevel.valueOf() https://issues.apache.org/jira/browse/TINKERPOP-1072 I always thought Spark had some configuration like `default.storageLevel` and then when a user did `cache()` it would do that default. I was wrong. `cache()` is always `MEMORY_ONLY`. I made it so you can specify the storage level for both persisted RDDs and runtime job RDDs and thus now (internally) use `persist(STORAGE_LEVEL)` where `MEMORY_ONLY` is the default. Test cases, docs, and Spark integration tests pass. VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1072 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/196.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #196 ---- commit 4082a4a043b54c102f49f220b14e2644817e1222 Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-01-08T18:05:08Z Allow the user to specify the persistence StorageLevel for both the computed job graph and any PersistedOutputRDD data. Updated docs, example conf, and added a test case that validates that persisted to SparkStorage is correct as the configuration changes. ---- > Allow the user to set persistence options using StorageLevel.valueOf() > ---------------------------------------------------------------------- > > Key: TINKERPOP-1072 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1072 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop > Affects Versions: 3.1.0-incubating > Reporter: Marko A. Rodriguez > Assignee: Marko A. Rodriguez > Fix For: 3.1.1-incubating > > > I always thought there was a Spark option to say stuff like > {{default.persist=DISK_SER_1}}, but I can't seem to find it. > If no such option exists, then we should add it to Spark-Gremlin. For > instance: > {code} > gremlin.spark.storageLevel=DISK_ONLY > {code} > See: > http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence > Then we would need to go through and where we have {{...cache()}} calls, they > need to be changed to > {{....persist(StorageLevel.valueOf(conf.get("gremlin.spark.storageLevel","MEMORY_ONLY")}}. > The question then becomes, do we provide flexibility where the user can have > the program caching different from the persisted RDD caching :|.... Too many > configurations sucks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)