Hi Ogata,

Thanks, I'll try with a Java equivalent of below structure. I can change it later if needed. Stay tuned...

Regards, Peter


On 09/21/2017 11:39 AM, Kazunori Ogata wrote:
Hi Peter,

If you could extract from
the benchmark just the approximate shape of the data structure and
typical values it contains, I could create a JMH benchmark that tests
just that part. Which would be appropriate to tune serialization code.
My colleague investigated the objects serialized/deserialized in the
benchmark.  He found there are several types of object trees, and one of
the largest object tree looks like (in Scala types):

scala.Tuple2[1]
  `-scala.Tuple2
     +-Int
     `-scala.Tuple2
        +-org.apache.spark.ml.tree.ContinuousSplit
        |  +-Int
        |  `-Boolean
        `-org.apache.spark.mllib.tree.model.ImpurityStats
           +-Double
           +-Double
           +-org.apache.spark.mllib.tree.impurity.VarianceCalculator
           |  `-Double[3]
           +-org.apache.spark.mllib.tree.impurity.VarianceCalculator
           |  `-Double[3]
           +-org.apache.spark.mllib.tree.impurity.VarianceCalculator
           |  `-Double[3]
           `-Boolean


Now the question is how the Java classes (including class hierarchy) look
like because Scala types may need extra boxing/unboxing (though I'm not
confident).  I'll try to decompile .class files generated from the scala
code and find the real Java types.


Regards,
Ogata


Reply via email to