Hi Ogata,
Thanks, I'll try with a Java equivalent of below structure. I can change
it later if needed. Stay tuned...
Regards, Peter
On 09/21/2017 11:39 AM, Kazunori Ogata wrote:
Hi Peter,
If you could extract from
the benchmark just the approximate shape of the data structure and
typical values it contains, I could create a JMH benchmark that tests
just that part. Which would be appropriate to tune serialization code.
My colleague investigated the objects serialized/deserialized in the
benchmark. He found there are several types of object trees, and one of
the largest object tree looks like (in Scala types):
scala.Tuple2[1]
`-scala.Tuple2
+-Int
`-scala.Tuple2
+-org.apache.spark.ml.tree.ContinuousSplit
| +-Int
| `-Boolean
`-org.apache.spark.mllib.tree.model.ImpurityStats
+-Double
+-Double
+-org.apache.spark.mllib.tree.impurity.VarianceCalculator
| `-Double[3]
+-org.apache.spark.mllib.tree.impurity.VarianceCalculator
| `-Double[3]
+-org.apache.spark.mllib.tree.impurity.VarianceCalculator
| `-Double[3]
`-Boolean
Now the question is how the Java classes (including class hierarchy) look
like because Scala types may need extra boxing/unboxing (though I'm not
confident). I'll try to decompile .class files generated from the scala
code and find the real Java types.
Regards,
Ogata