I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with spark-0.8.0 with this line in bash.rc " export _JAVA_OPTIONS="-Xmx15g -Xms15g -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" ". It finished in a decent time, ~50 seconds, and I had only a few "Full GC...." messages from Java. (a max of 4-5)
Now, using the same export in bash.rc but with spark-1.0.0 (and running it with spark-submit) the first loop never finishes and I get a lot of: "18.537: [GC (Allocation Failure) --[PSYoungGen: 11796992K->11796992K(13762560K)] 11797442K->11797450K(13763072K), 2.8420311 secs] [Times: user=5.81 sys=2.12, real=2.85 secs] " or "31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K->3177967K(13762560K)] [ParOldGen: 505K->505K(512K)] 11797497K->3178473K(13763072K), [Metaspace: 37646K->37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11, real=2.31 secs]" I tried passing different parameters for the JVM through spark-submit, but the results are the same This happens with java 1.7 and also with java 1.8. I do not know what the "Ergonomics" stands for ... How can I get a decent performance from spark-1.0.0 considering that spark-0.8.0 did not need any fine tuning on the gargage collection method (the default worked well) ? Thank you