Github user vtslab commented on a diff in the pull request:

    https://github.com/apache/tinkerpop/pull/721#discussion_r142223415
  
    --- Diff: hadoop-gremlin/conf/hadoop-gryo.properties ---
    @@ -29,8 +29,8 @@ gremlin.hadoop.outputLocation=output
     spark.master=local[4]
     spark.executor.memory=1g
     
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
    +gremlin.spark.persistContext=true
    --- End diff --
    
    Good question, I had not justified this yet. My original reason was that 
stopping both the SparkContext and the gremlin console as in the docs 
generation, can lead to race conditions in spark-yarn with random connection 
exceptions showing up in the console output in the docs. But as a bonus, 
follow-up OLAP queries get answered much faster as you skip the overhead for 
getting resources from yarn. This is what is also done in Apache Zeppelin, 
Spark shell and the like.
    
    The alternative is to set the property in the console together with the 
other properties. This would require some more explanation and configuration 
work afterwards to/from the recipe users, but would leave the properties file 
untouched. I like the current proposal better, but I am fine with both.


---

Reply via email to