Hi Peter,

> If you could extract from 
> the benchmark just the approximate shape of the data structure and 
> typical values it contains, I could create a JMH benchmark that tests 
> just that part. Which would be appropriate to tune serialization code. 

My colleague investigated the objects serialized/deserialized in the 
benchmark.  He found there are several types of object trees, and one of 
the largest object tree looks like (in Scala types):

scala.Tuple2[1]
 `-scala.Tuple2
    +-Int
    `-scala.Tuple2
       +-org.apache.spark.ml.tree.ContinuousSplit
       |  +-Int
       |  `-Boolean
       `-org.apache.spark.mllib.tree.model.ImpurityStats
          +-Double
          +-Double
          +-org.apache.spark.mllib.tree.impurity.VarianceCalculator
          |  `-Double[3]
          +-org.apache.spark.mllib.tree.impurity.VarianceCalculator
          |  `-Double[3]
          +-org.apache.spark.mllib.tree.impurity.VarianceCalculator
          |  `-Double[3]
          `-Boolean


Now the question is how the Java classes (including class hierarchy) look 
like because Scala types may need extra boxing/unboxing (though I'm not 
confident).  I'll try to decompile .class files generated from the scala 
code and find the real Java types.


Regards,
Ogata

Reply via email to