With spark-1.0.0 this is the cmdline from /proc/#pid: (with the export line
export _JAVA_OPTIONS=...)
I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with
spark-0.8.0 with this line in bash.rc export _JAVA_OPTIONS=-Xmx15g -Xms15g
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails . It finished in a
decent time, ~50 seconds, and I had only a few Full GC messages
Try looking at the running processes with “ps” to see their full command line
and see whether any options are different. It seems like in both cases, your
young generation is quite large (11 GB), which doesn’t make lot of sense with a
heap of 15 GB. But maybe I’m misreading something.
Matei