Hi Ogata,

On 09/20/2017 12:12 PM, Kazunori Ogata wrote:
Hi Peter,

The benchmark is GradientBoostingTree of Intel HiBench [1].  HiBench is a
suite of programs using Hadoop or Spark, and GradientBoostingTree is a
Spark program.  The source code (in Scala) is [2].  To build the code, you
need Apache Spark.

The command line is equivalent to java -Xmx10g -D spark.master="local[4]"
GradientBoostingTree <inputDir> 100, but what I actually use is a Java
program that calls the main method and measures its execution time using
currentTimeMills().

By the way, I'm running the benchmark on POWER8 machine.  Removing
volatile won't change the performance on x86.


[1] https://github.com/intel-hadoop/HiBench
[2]
https://github.com/intel-hadoop/HiBench/blob/master/sparkbench/ml/src/main/scala/com/intel/sparkbench/ml/GradientBoostingTree.scala


Regards,
Ogata


Huh, I thought it would be something easier to run. Am I right that the improvement we are expecting comes from execution of Java serialization and deserialization of some data structure? If you could extract from the benchmark just the approximate shape of the data structure and typical values it contains, I could create a JMH benchmark that tests just that part. Which would be appropriate to tune serialization code. After some best variant is chosen, you could verify it by running your test in your Spark setup. I think there is still room for improvement. I have a few ideas I would like to test.

Regards, Peter

Reply via email to