I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with 
spark-0.8.0 with this line in bash.rc " export _JAVA_OPTIONS="-Xmx15g -Xms15g 
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" ". It finished in a 
decent time, ~50 seconds, and I had only a few "Full GC...." messages from 
Java. (a max of 4-5)

Now, using the same export in bash.rc but with spark-1.0.0  (and running it 
with spark-submit) the first loop never finishes and  I get a lot of:
"18.537: [GC (Allocation Failure) --[PSYoungGen: 
11796992K->11796992K(13762560K)] 11797442K->11797450K(13763072K), 2.8420311 
secs] [Times: user=5.81 sys=2.12, real=2.85 secs]
"
or 

 "31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K->3177967K(13762560K)] 
[ParOldGen: 505K->505K(512K)] 11797497K->3178473K(13763072K), [Metaspace: 
37646K->37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11, 
real=2.31 secs]"
 
I tried passing different parameters for the JVM through spark-submit, but the 
results are the same
This happens with java 1.7 and also with java 1.8.
I do not know what the "Ergonomics" stands for ...

How can I get a decent performance from spark-1.0.0 considering that 
spark-0.8.0 did not need any fine tuning on the gargage collection method (the 
default worked well) ?

Thank you

Reply via email to