[ https://issues.apache.org/jira/browse/TINKERPOP-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090666#comment-15090666 ]
ASF GitHub Bot commented on TINKERPOP-1072: ------------------------------------------- Github user twilmes commented on a diff in the pull request: https://github.com/apache/incubator-tinkerpop/pull/196#discussion_r49265086 --- Diff: spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedInputOutputRDDTest.java --- @@ -54,6 +56,44 @@ public class PersistedInputOutputRDDTest extends AbstractSparkTest { @Test + public void shouldPersistRDDBasedOnStorageLevel() throws Exception { + Spark.create("local[4]"); + int counter = 0; + for (final String storageLevel : Arrays.asList("MEMORY_ONLY", "DISK_ONLY","MEMORY_ONLY_SER","MEMORY_AND_DISK_SER","OFF_HEAP")) { + assertEquals(counter * 2, Spark.getRDDs().size()); + counter++; + final String rddName = TestHelper.makeTestDataDirectory(PersistedInputOutputRDDTest.class, UUID.randomUUID().toString()); + final Configuration configuration = new BaseConfiguration(); + configuration.setProperty("spark.master", "local[4]"); + configuration.setProperty("spark.serializer", GryoSerializer.class.getCanonicalName()); + configuration.setProperty(Graph.GRAPH, HadoopGraph.class.getName()); + configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo")); + configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, GryoInputFormat.class.getCanonicalName()); + configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_OUTPUT_RDD, PersistedOutputRDD.class.getCanonicalName()); + configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_STORAGE_LEVEL, storageLevel); + configuration.setProperty(Constants.GREMLIN_HADOOP_JARS_IN_DISTRIBUTED_CACHE, false); + configuration.setProperty(Constants.GREMLIN_HADOOP_OUTPUT_LOCATION, rddName); + configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_CONTEXT, true); + Graph graph = GraphFactory.open(configuration); + graph.compute(SparkGraphComputer.class) + .result(GraphComputer.ResultGraph.NEW) + .persist(GraphComputer.Persist.EDGES) + .program(TraversalVertexProgram.build() + .traversal(GraphTraversalSource.build().engine(ComputerTraversalEngine.build().computer(SparkGraphComputer.class)), + "gremlin-groovy", + "g.V()").create(graph)).submit().get(); + //////// + assertTrue(Spark.hasRDD(Constants.getGraphLocation(rddName))); + assertEquals(StorageLevel.fromString(storageLevel), Spark.getRDD(Constants.getGraphLocation(rddName)).getStorageLevel()); + assertTrue(Spark.hasRDD(Constants.getMemoryLocation(rddName, Graph.Hidden.hide("traversers")))); + assertEquals(StorageLevel.fromString(storageLevel), Spark.getRDD(Constants.getMemoryLocation(rddName, Graph.Hidden.hide("traversers"))).getStorageLevel()); + assertEquals(counter * 2, Spark.getRDDs().size()); + //System.out.println(SparkContextStorage.open().ls()); --- End diff -- Looks like there was a lingering debug println here that could be removed. > Allow the user to set persistence options using StorageLevel.valueOf() > ---------------------------------------------------------------------- > > Key: TINKERPOP-1072 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1072 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop > Affects Versions: 3.1.0-incubating > Reporter: Marko A. Rodriguez > Assignee: Marko A. Rodriguez > Fix For: 3.1.1-incubating > > > I always thought there was a Spark option to say stuff like > {{default.persist=DISK_SER_1}}, but I can't seem to find it. > If no such option exists, then we should add it to Spark-Gremlin. For > instance: > {code} > gremlin.spark.storageLevel=DISK_ONLY > {code} > See: > http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence > Then we would need to go through and where we have {{...cache()}} calls, they > need to be changed to > {{....persist(StorageLevel.valueOf(conf.get("gremlin.spark.storageLevel","MEMORY_ONLY")}}. > The question then becomes, do we provide flexibility where the user can have > the program caching different from the persisted RDD caching :|.... Too many > configurations sucks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)