For historical reasons some code I stole to do the similarity stuff had the following spark settings:
sparkConf.set("spark.kryo.referenceTracking", "false")
.set("spark.kryoserializer.buffer.mb", "200")// todo: should this be left
to config or an option?
I’m not all that familiar with kryo. Are these things better left to a
-D:key=value type param? Seems like they shouldn’t be hard coded unless
tracking is universal. Any opinions?
