Github user vtslab commented on a diff in the pull request:
https://github.com/apache/tinkerpop/pull/721#discussion_r142223415
--- Diff: hadoop-gremlin/conf/hadoop-gryo.properties ---
@@ -29,8 +29,8 @@ gremlin.hadoop.outputLocation=output
spark.master=local[4]
spark.executor.memory=1g
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
+gremlin.spark.persistContext=true
--- End diff --
Good question, I had not justified this yet. My original reason was that
stopping both the SparkContext and the gremlin console as in the docs
generation, can lead to race conditions in spark-yarn with random connection
exceptions showing up in the console output in the docs. But as a bonus,
follow-up OLAP queries get answered much faster as you skip the overhead for
getting resources from yarn. This is what is also done in Apache Zeppelin,
Spark shell and the like.
The alternative is to set the property in the console together with the
other properties. This would require some more explanation and configuration
work afterwards to/from the recipe users, but would leave the properties file
untouched. I like the current proposal better, but I am fine with both.
---