[
https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324130#comment-14324130
]
Dr. Christian Betz edited comment on SPARK-5081 at 2/17/15 12:31 PM:
---------------------------------------------------------------------
I was checking my assumption, that the CDH-Version and the Spark-1.1.0 version
showed the same behavior. It is true concerning shuffle spills. (Spark 1.1.0
has also
||Shuffle Spill (Memory)||Shuffle Spill (Disk)||
|0.0 B|0.0 B|
However, I see lots of small spills like this
org.apache.spark.util.collection.ExternalAppendOnlyMap : Thread 78
spilling in-memory map of 1 MB to disk (322 times so far)
That's taking about the same time per task (several minutes instead of tens of
seconds).
So there are several assumptions going on from here:
* Spark 1.1.0 does not report the same shuffle writes to memory and disk as
does Spark 1.2.1, misleading us.
* It's not a Spark code issue but one with dependencies changed. I'm running
Spark 1.1.0 against hadoop-client/hdfs 2.6.0, so might come from there.
Here's the diff from the two runs:
*Classpath entries only in CDH-Version:*
* /com/jamesmurty/utils/java-xmlbuilder/0.4/java-xmlbuilder-0.4.jar
*Classpath entries only in 1.1.0/Hadoop-2.6.0-version:*
* /asm/asm/3.1/asm-3.1.jar
* /com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
* /commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.jar
* /commons-el/commons-el/1.0/commons-el-1.0.jar
* /javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar
* /org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar
* /tomcat/jasper-runtime/5.5.23/jasper-runtime-5.5.23.jar
* /xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
* /xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar
*Classpath entries with changes:*
* /org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.jar ->
/org/apache/spark/spark-core_2.10/1.1.0-cdh5.2.0/spark-core_2.10-1.1.0-cdh5.2.0.jar
* /net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar ->
/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar
* /org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar ->
/org/apache/httpcomponents/httpclient/4.1.2/httpclient-4.1.2.jar
* /org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar ->
/org/apache/httpcomponents/httpcore/4.1.2/httpcore-4.1.2.jar
* /org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar ->
1.8.8 (same with other jackson libs)
* /org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar
* /org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar
* /org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar
* /org/apache/hadoop/hadoop-annotations/2.6.0/hadoop-annotations-2.6.0.jar ->
...-2.5.0-cdh5.2.0.jar (same below)
* /org/apache/hadoop/hadoop-auth/2.6.0/hadoop-auth-2.6.0.jar
* /org/apache/hadoop/hadoop-client/2.6.0/hadoop-client-2.6.0.jar
* /org/apache/hadoop/hadoop-common/2.6.0/hadoop-common-2.6.0.jar
* /org/apache/hadoop/hadoop-hdfs/2.6.0/hadoop-hdfs-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.0/hadoop-mapreduce-client-app-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.0/hadoop-mapreduce-client-common-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.0/hadoop-mapreduce-client-core-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.0/hadoop-mapreduce-client-jobclient-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.0/hadoop-mapreduce-client-shuffle-2.6.0.jar
* /org/apache/hadoop/hadoop-yarn-api/2.6.0/hadoop-yarn-api-2.6.0.jar
* /org/apache/hadoop/hadoop-yarn-client/2.6.0/hadoop-yarn-client-2.6.0.jar
* /org/apache/hadoop/hadoop-yarn-common/2.6.0/hadoop-yarn-common-2.6.0.jar
*
/org/apache/hadoop/hadoop-yarn-server-common/2.6.0/hadoop-yarn-server-common-2.6.0.jar
Here's my dependency list (from WebUI) for Spark 1.1.0 with hadoop 2.6.0:
/asm/asm/3.1/asm-3.1.jar
/cheshire/cheshire/5.3.1/cheshire-5.3.1.jar
/cider/cider-nrepl/0.8.2/cider-nrepl-0.8.2.jar
/clj-logging-config/clj-logging-config/1.9.12/clj-logging-config-1.9.12.jar
/clj-time/clj-time/0.8.0/clj-time-0.8.0.jar
/cljs-tooling/cljs-tooling/0.1.3/cljs-tooling-0.1.3.jar
/colt/colt/1.2.0/colt-1.2.0.jar
/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar
/com/codahale/metrics/metrics-core/3.0.0/metrics-core-3.0.0.jar
/com/codahale/metrics/metrics-graphite/3.0.0/metrics-graphite-3.0.0.jar
/com/codahale/metrics/metrics-json/3.0.0/metrics-json-3.0.0.jar
/com/codahale/metrics/metrics-jvm/3.0.0/metrics-jvm-3.0.0.jar
/com/damballa/abracad/0.4.11/abracad-0.4.11.jar
/com/damballa/parkour/0.6.1/parkour-0.6.1.jar
/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar
/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar
/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar
/com/fasterxml/jackson/core/jackson-annotations/2.3.0/jackson-annotations-2.3.0.jar
/com/fasterxml/jackson/core/jackson-core/2.3.1/jackson-core-2.3.1.jar
/com/fasterxml/jackson/core/jackson-databind/2.3.1/jackson-databind-2.3.1.jar
/com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.3.1/jackson-dataformat-smile-2.3.1.jar
/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar
/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar
/com/google/guava/guava/14.0.1/guava-14.0.1.jar
/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar
/com/ning/compress-lzf/1.0.0/compress-lzf-1.0.0.jar
/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar
/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar
/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar
/com/twitter/carbonite/1.4.0/carbonite-1.4.0.jar
/com/twitter/chill-java/0.3.6/chill-java-0.3.6.jar
/com/twitter/chill_2.10/0.5.2/chill_2.10-0.5.2.jar
/com/typesafe/config/1.0.2/config-1.0.2.jar
/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar
/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar
/commons-cli/commons-cli/1.2/commons-cli-1.2.jar
/commons-codec/commons-codec/1.4/commons-codec-1.4.jar
/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar
/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar
/commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.jar
/commons-digester/commons-digester/1.8/commons-digester-1.8.jar
/commons-el/commons-el/1.0/commons-el-1.0.jar
/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar
/commons-io/commons-io/2.4/commons-io-2.4.jar
/commons-lang/commons-lang/2.6/commons-lang-2.6.jar
/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar
/commons-net/commons-net/2.2/commons-net-2.2.jar
/compliment/compliment/0.2.0/compliment-0.2.0.jar
/concurrent/concurrent/1.3.4/concurrent-1.3.4.jar
/gorillalabs/config/1.0.0/config-1.0.0.jar
/gorillalabs/sparkling/1.1.0/sparkling-1.1.0.jar
/io/netty/netty-all/4.0.23.Final/netty-all-4.0.23.Final.jar
/io/netty/netty/3.6.6.Final/netty-3.6.6.Final.jar
/javax/activation/activation/1.1/activation-1.1.jar
/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar
/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar
/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar
/jline/jline/0.9.94/jline-0.9.94.jar
/joda-time/joda-time/2.3/joda-time-2.3.jar
/log4j/log4j/1.2.17/log4j-1.2.17.jar
/mysql/mysql-connector-java/5.1.31/mysql-connector-java-5.1.31.jar
/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar
/net/jpountz/lz4/lz4/1.2.0/lz4-1.2.0.jar
/net/sf/py4j/py4j/0.8.2.1/py4j-0.8.2.1.jar
/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar
/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar
/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar
/org/apache/avro/avro/1.7.7/avro-1.7.7.jar
/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar
/org/apache/commons/commons-lang3/3.3.2/commons-lang3-3.3.2.jar
/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar
/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar
/org/apache/curator/curator-framework/2.4.0/curator-framework-2.4.0.jar
/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar
/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar
/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar
/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar
/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar
/org/apache/hadoop/hadoop-annotations/2.6.0/hadoop-annotations-2.6.0.jar
/org/apache/hadoop/hadoop-auth/2.6.0/hadoop-auth-2.6.0.jar
/org/apache/hadoop/hadoop-client/2.6.0/hadoop-client-2.6.0.jar
/org/apache/hadoop/hadoop-common/2.6.0/hadoop-common-2.6.0.jar
/org/apache/hadoop/hadoop-hdfs/2.6.0/hadoop-hdfs-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.0/hadoop-mapreduce-client-app-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.0/hadoop-mapreduce-client-common-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.0/hadoop-mapreduce-client-core-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.0/hadoop-mapreduce-client-jobclient-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.0/hadoop-mapreduce-client-shuffle-2.6.0.jar
/org/apache/hadoop/hadoop-yarn-api/2.6.0/hadoop-yarn-api-2.6.0.jar
/org/apache/hadoop/hadoop-yarn-client/2.6.0/hadoop-yarn-client-2.6.0.jar
/org/apache/hadoop/hadoop-yarn-common/2.6.0/hadoop-yarn-common-2.6.0.jar
/org/apache/hadoop/hadoop-yarn-server-common/2.6.0/hadoop-yarn-server-common-2.6.0.jar
/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar
/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar
/org/apache/mesos/mesos/0.18.1/mesos-0.18.1-shaded-protobuf.jar
/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.jar
/org/apache/velocity/velocity/1.7/velocity-1.7.jar
/org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.jar
/org/clojure/clojure/1.6.0/clojure-1.6.0.jar
/org/clojure/core.cache/0.6.3/core.cache-0.6.3.jar
/org/clojure/core.memoize/0.5.6/core.memoize-0.5.6.jar
/org/clojure/data.priority-map/0.0.2/data.priority-map-0.0.2.jar
/org/clojure/java.classpath/0.2.0/java.classpath-0.2.0.jar
/org/clojure/java.jdbc/0.3.5/java.jdbc-0.3.5.jar
/org/clojure/math.numeric-tower/0.0.4/math.numeric-tower-0.0.4.jar
/org/clojure/tools.cli/0.3.1/tools.cli-0.3.1.jar
/org/clojure/tools.logging/0.3.1/tools.logging-0.3.1.jar
/org/clojure/tools.namespace/0.2.5/tools.namespace-0.2.5.jar
/org/clojure/tools.nrepl/0.2.5/tools.nrepl-0.2.5.jar
/org/clojure/tools.trace/0.7.8/tools.trace-0.7.8.jar
/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar
/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar
/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar
/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar
/org/eclipse/jetty/jetty-continuation/8.1.14.v20131031/jetty-continuation-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-http/8.1.14.v20131031/jetty-http-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-io/8.1.14.v20131031/jetty-io-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-jndi/8.1.14.v20131031/jetty-jndi-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-plus/8.1.14.v20131031/jetty-plus-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-security/8.1.14.v20131031/jetty-security-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-server/8.1.14.v20131031/jetty-server-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-servlet/8.1.14.v20131031/jetty-servlet-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-util/8.1.14.v20131031/jetty-util-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-webapp/8.1.14.v20131031/jetty-webapp-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-xml/8.1.14.v20131031/jetty-xml-8.1.14.v20131031.jar
/org/eclipse/jetty/orbit/javax.activation/1.1.0.v201105071233/javax.activation-1.1.0.v201105071233.jar
/org/eclipse/jetty/orbit/javax.mail.glassfish/1.4.1.v201005082020/javax.mail.glassfish-1.4.1.v201005082020.jar
/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar
/org/eclipse/jetty/orbit/javax.transaction/1.1.1.v201105210645/javax.transaction-1.1.1.v201105210645.jar
/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar
/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar
/org/json4s/json4s-ast_2.10/3.2.10/json4s-ast_2.10-3.2.10.jar
/org/json4s/json4s-core_2.10/3.2.10/json4s-core_2.10-3.2.10.jar
/org/json4s/json4s-jackson_2.10/3.2.10/json4s-jackson_2.10-3.2.10.jar
/org/objenesis/objenesis/1.2/objenesis-1.2.jar
/org/ow2/asm/asm/4.0/asm-4.0.jar
/org/scala-lang/scala-compiler/2.10.0/scala-compiler-2.10.0.jar
/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar
/org/scala-lang/scala-reflect/2.10.0/scala-reflect-2.10.0.jar
/org/scala-lang/scalap/2.10.0/scalap-2.10.0.jar
/org/slf4j/jcl-over-slf4j/1.7.5/jcl-over-slf4j-1.7.5.jar
/org/slf4j/jul-to-slf4j/1.7.5/jul-to-slf4j-1.7.5.jar
/org/slf4j/slf4j-api/1.7.7/slf4j-api-1.7.7.jar
/org/slf4j/slf4j-log4j12/1.7.7/slf4j-log4j12-1.7.7.jar
/org/spark-project/akka/akka-actor_2.10/2.2.3-shaded-protobuf/akka-actor_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/akka/akka-slf4j_2.10/2.2.3-shaded-protobuf/akka-slf4j_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/protobuf/protobuf-java/2.4.1-shaded/protobuf-java-2.4.1-shaded.jar
/org/spark-project/pyrolite/2.0.1/pyrolite-2.0.1.jar
/org/tachyonproject/tachyon-client/0.5.0/tachyon-client-0.5.0.jar
/org/tachyonproject/tachyon/0.5.0/tachyon-0.5.0.jar
/org/tcrawley/dynapath/0.2.3/dynapath-0.2.3.jar
/org/tukaani/xz/1.0/xz-1.0.jar
/org/uncommons/maths/uncommons-maths/1.2.2a/uncommons-maths-1.2.2a.jar
/org/xerial/snappy/snappy-java/1.0.5.3/snappy-java-1.0.5.3.jar
/pjstadig/scopes/0.3.0/scopes-0.3.0.jar
/tigris/tigris/0.1.1/tigris-0.1.1.jar
/tomcat/jasper-runtime/5.5.23/jasper-runtime-5.5.23.jar
/transduce/transduce/0.1.1/transduce-0.1.1.jar
/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar
/xmlenc/xmlenc/0.52/xmlenc-0.52.jar
/spark/pacs-spark/resources
/spark/pacs-spark/src
/spark/pacs-spark/target/classes
/spark/pacs-spark/test
And here's my dependency list with Spark-1.1.0-CDH:
/cheshire/cheshire/5.3.1/cheshire-5.3.1.jar
/cider/cider-nrepl/0.8.2/cider-nrepl-0.8.2.jar
/clj-logging-config/clj-logging-config/1.9.12/clj-logging-config-1.9.12.jar
/clj-time/clj-time/0.8.0/clj-time-0.8.0.jar
/cljs-tooling/cljs-tooling/0.1.3/cljs-tooling-0.1.3.jar
/colt/colt/1.2.0/colt-1.2.0.jar
/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar
/com/codahale/metrics/metrics-core/3.0.0/metrics-core-3.0.0.jar
/com/codahale/metrics/metrics-graphite/3.0.0/metrics-graphite-3.0.0.jar
/com/codahale/metrics/metrics-json/3.0.0/metrics-json-3.0.0.jar
/com/codahale/metrics/metrics-jvm/3.0.0/metrics-jvm-3.0.0.jar
/com/damballa/abracad/0.4.11/abracad-0.4.11.jar
/com/damballa/parkour/0.6.1/parkour-0.6.1.jar
/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar
/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar
/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar
/com/fasterxml/jackson/core/jackson-annotations/2.3.0/jackson-annotations-2.3.0.jar
/com/fasterxml/jackson/core/jackson-core/2.3.1/jackson-core-2.3.1.jar
/com/fasterxml/jackson/core/jackson-databind/2.3.1/jackson-databind-2.3.1.jar
/com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.3.1/jackson-dataformat-smile-2.3.1.jar
/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar
/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar
/com/google/guava/guava/14.0.1/guava-14.0.1.jar
/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar
/com/jamesmurty/utils/java-xmlbuilder/0.4/java-xmlbuilder-0.4.jar
/com/ning/compress-lzf/1.0.0/compress-lzf-1.0.0.jar
/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar
/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar
/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar
/com/twitter/carbonite/1.4.0/carbonite-1.4.0.jar
/com/twitter/chill-java/0.3.6/chill-java-0.3.6.jar
/com/twitter/chill_2.10/0.5.2/chill_2.10-0.5.2.jar
/com/typesafe/config/1.0.2/config-1.0.2.jar
/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar
/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar
/commons-cli/commons-cli/1.2/commons-cli-1.2.jar
/commons-codec/commons-codec/1.4/commons-codec-1.4.jar
/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar
/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar
/commons-digester/commons-digester/1.8/commons-digester-1.8.jar
/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar
/commons-io/commons-io/2.4/commons-io-2.4.jar
/commons-lang/commons-lang/2.6/commons-lang-2.6.jar
/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar
/commons-net/commons-net/2.2/commons-net-2.2.jar
/compliment/compliment/0.2.0/compliment-0.2.0.jar
/concurrent/concurrent/1.3.4/concurrent-1.3.4.jar
/gorillalabs/config/1.0.0/config-1.0.0.jar
/gorillalabs/sparkling/1.1.0/sparkling-1.1.0.jar
/io/netty/netty-all/4.0.23.Final/netty-all-4.0.23.Final.jar
/io/netty/netty/3.6.6.Final/netty-3.6.6.Final.jar
/javax/activation/activation/1.1/activation-1.1.jar
/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar
/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar
/jline/jline/0.9.94/jline-0.9.94.jar
/joda-time/joda-time/2.3/joda-time-2.3.jar
/log4j/log4j/1.2.17/log4j-1.2.17.jar
/mysql/mysql-connector-java/5.1.31/mysql-connector-java-5.1.31.jar
/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar
/net/jpountz/lz4/lz4/1.2.0/lz4-1.2.0.jar
/net/sf/py4j/py4j/0.8.2.1/py4j-0.8.2.1.jar
/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar
/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar
/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar
/org/apache/avro/avro/1.7.7/avro-1.7.7.jar
/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar
/org/apache/commons/commons-lang3/3.3.2/commons-lang3-3.3.2.jar
/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar
/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar
/org/apache/curator/curator-framework/2.4.0/curator-framework-2.4.0.jar
/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar
/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar
/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar
/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar
/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar
/org/apache/hadoop/hadoop-annotations/2.5.0-cdh5.2.0/hadoop-annotations-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-auth/2.5.0-cdh5.2.0/hadoop-auth-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-client/2.5.0-cdh5.2.0/hadoop-client-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-common/2.5.0-cdh5.2.0/hadoop-common-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-hdfs/2.5.0-cdh5.2.0/hadoop-hdfs-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0-cdh5.2.0/hadoop-mapreduce-client-app-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-common/2.5.0-cdh5.2.0/hadoop-mapreduce-client-common-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-core/2.5.0-cdh5.2.0/hadoop-mapreduce-client-core-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.5.0-cdh5.2.0/hadoop-mapreduce-client-jobclient-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.5.0-cdh5.2.0/hadoop-mapreduce-client-shuffle-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-yarn-api/2.5.0-cdh5.2.0/hadoop-yarn-api-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-yarn-client/2.5.0-cdh5.2.0/hadoop-yarn-client-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-yarn-common/2.5.0-cdh5.2.0/hadoop-yarn-common-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-yarn-server-common/2.5.0-cdh5.2.0/hadoop-yarn-server-common-2.5.0-cdh5.2.0.jar
/org/apache/httpcomponents/httpclient/4.1.2/httpclient-4.1.2.jar
/org/apache/httpcomponents/httpcore/4.1.2/httpcore-4.1.2.jar
/org/apache/mesos/mesos/0.18.1/mesos-0.18.1-shaded-protobuf.jar
/org/apache/spark/spark-core_2.10/1.1.0-cdh5.2.0/spark-core_2.10-1.1.0-cdh5.2.0.jar
/org/apache/velocity/velocity/1.7/velocity-1.7.jar
/org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.jar
/org/clojure/clojure/1.6.0/clojure-1.6.0.jar
/org/clojure/core.cache/0.6.3/core.cache-0.6.3.jar
/org/clojure/core.memoize/0.5.6/core.memoize-0.5.6.jar
/org/clojure/data.priority-map/0.0.2/data.priority-map-0.0.2.jar
/org/clojure/java.classpath/0.2.0/java.classpath-0.2.0.jar
/org/clojure/java.jdbc/0.3.5/java.jdbc-0.3.5.jar
/org/clojure/math.numeric-tower/0.0.4/math.numeric-tower-0.0.4.jar
/org/clojure/tools.cli/0.3.1/tools.cli-0.3.1.jar
/org/clojure/tools.logging/0.3.1/tools.logging-0.3.1.jar
/org/clojure/tools.namespace/0.2.5/tools.namespace-0.2.5.jar
/org/clojure/tools.nrepl/0.2.5/tools.nrepl-0.2.5.jar
/org/clojure/tools.trace/0.7.8/tools.trace-0.7.8.jar
/org/codehaus/jackson/jackson-core-asl/1.8.8/jackson-core-asl-1.8.8.jar
/org/codehaus/jackson/jackson-jaxrs/1.8.8/jackson-jaxrs-1.8.8.jar
/org/codehaus/jackson/jackson-mapper-asl/1.8.8/jackson-mapper-asl-1.8.8.jar
/org/codehaus/jackson/jackson-xc/1.8.8/jackson-xc-1.8.8.jar
/org/eclipse/jetty/jetty-continuation/8.1.14.v20131031/jetty-continuation-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-http/8.1.14.v20131031/jetty-http-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-io/8.1.14.v20131031/jetty-io-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-jndi/8.1.14.v20131031/jetty-jndi-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-plus/8.1.14.v20131031/jetty-plus-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-security/8.1.14.v20131031/jetty-security-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-server/8.1.14.v20131031/jetty-server-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-servlet/8.1.14.v20131031/jetty-servlet-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-util/8.1.14.v20131031/jetty-util-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-webapp/8.1.14.v20131031/jetty-webapp-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-xml/8.1.14.v20131031/jetty-xml-8.1.14.v20131031.jar
/org/eclipse/jetty/orbit/javax.activation/1.1.0.v201105071233/javax.activation-1.1.0.v201105071233.jar
/org/eclipse/jetty/orbit/javax.mail.glassfish/1.4.1.v201005082020/javax.mail.glassfish-1.4.1.v201005082020.jar
/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar
/org/eclipse/jetty/orbit/javax.transaction/1.1.1.v201105210645/javax.transaction-1.1.1.v201105210645.jar
/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar
/org/json4s/json4s-ast_2.10/3.2.10/json4s-ast_2.10-3.2.10.jar
/org/json4s/json4s-core_2.10/3.2.10/json4s-core_2.10-3.2.10.jar
/org/json4s/json4s-jackson_2.10/3.2.10/json4s-jackson_2.10-3.2.10.jar
/org/objenesis/objenesis/1.2/objenesis-1.2.jar
/org/ow2/asm/asm/4.0/asm-4.0.jar
/org/scala-lang/scala-compiler/2.10.0/scala-compiler-2.10.0.jar
/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar
/org/scala-lang/scala-reflect/2.10.0/scala-reflect-2.10.0.jar
/org/scala-lang/scalap/2.10.0/scalap-2.10.0.jar
/org/slf4j/jcl-over-slf4j/1.7.5/jcl-over-slf4j-1.7.5.jar
/org/slf4j/jul-to-slf4j/1.7.5/jul-to-slf4j-1.7.5.jar
/org/slf4j/slf4j-api/1.7.7/slf4j-api-1.7.7.jar
/org/slf4j/slf4j-log4j12/1.7.7/slf4j-log4j12-1.7.7.jar
/org/spark-project/akka/akka-actor_2.10/2.2.3-shaded-protobuf/akka-actor_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/akka/akka-slf4j_2.10/2.2.3-shaded-protobuf/akka-slf4j_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/protobuf/protobuf-java/2.4.1-shaded/protobuf-java-2.4.1-shaded.jar
/org/spark-project/pyrolite/2.0.1/pyrolite-2.0.1.jar
/org/tachyonproject/tachyon-client/0.5.0/tachyon-client-0.5.0.jar
/org/tachyonproject/tachyon/0.5.0/tachyon-0.5.0.jar
/org/tcrawley/dynapath/0.2.3/dynapath-0.2.3.jar
/org/tukaani/xz/1.0/xz-1.0.jar
/org/uncommons/maths/uncommons-maths/1.2.2a/uncommons-maths-1.2.2a.jar
/org/xerial/snappy/snappy-java/1.0.5.3/snappy-java-1.0.5.3.jar
/pjstadig/scopes/0.3.0/scopes-0.3.0.jar
/tigris/tigris/0.1.1/tigris-0.1.1.jar
/transduce/transduce/0.1.1/transduce-0.1.1.jar
/xmlenc/xmlenc/0.52/xmlenc-0.52.jar
/spark/pacs-spark/resources
/spark/pacs-spark/src
/spark/pacs-spark/target/classes
/spark/pacs-spark/test
was (Author: cbbetz):
I was checking my assumption, that the CDH-Version and the Spark-1.1.0 version
showed the same behavior. It is true concerning shuffle spills. (Spark 1.1.0
has also
||Shuffle Spill (Memory)||Shuffle Spill (Disk)||
|0.0 B|0.0 B|
However, I see lots of small spills like this
org.apache.spark.util.collection.ExternalAppendOnlyMap : Thread 78
spilling in-memory map of 1 MB to disk (322 times so far)
That's taking about the same time per task (several minutes instead of tens of
seconds).
So there are several assumptions going on from here:
* Spark 1.1.0 does not report the same shuffle writes to memory and disk as
does Spark 1.2.1, misleading us.
* It's not a Spark code issue but one with dependencies changed. I'm running
Spark 1.1.0 against hadoop-client/hdfs 2.6.0, so might come from there.
Here's the diff from the two runs:
Classpath entries only in CDH-Version:
* /com/jamesmurty/utils/java-xmlbuilder/0.4/java-xmlbuilder-0.4.jar
Classpath entries only in 1.1.0/Hadoop-2.6.0-version:
* /asm/asm/3.1/asm-3.1.jar
* /com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
* /commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.jar
* /commons-el/commons-el/1.0/commons-el-1.0.jar
* /javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar
* /org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar
* /tomcat/jasper-runtime/5.5.23/jasper-runtime-5.5.23.jar
* /xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
* /xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar
Classpath entries with changes:
* /org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.jar ->
/org/apache/spark/spark-core_2.10/1.1.0-cdh5.2.0/spark-core_2.10-1.1.0-cdh5.2.0.jar
* /org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar ->
1.8.8 (same with other jackson libs)
* /org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar
* /org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar
* /org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar
* /net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar ->
/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar
* /org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar ->
/org/apache/httpcomponents/httpclient/4.1.2/httpclient-4.1.2.jar
* /org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar ->
/org/apache/httpcomponents/httpcore/4.1.2/httpcore-4.1.2.jar
* /org/apache/hadoop/hadoop-annotations/2.6.0/hadoop-annotations-2.6.0.jar ->
...-2.5.0-cdh5.2.0.jar (same below)
* /org/apache/hadoop/hadoop-auth/2.6.0/hadoop-auth-2.6.0.jar
* /org/apache/hadoop/hadoop-client/2.6.0/hadoop-client-2.6.0.jar
* /org/apache/hadoop/hadoop-common/2.6.0/hadoop-common-2.6.0.jar
* /org/apache/hadoop/hadoop-hdfs/2.6.0/hadoop-hdfs-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.0/hadoop-mapreduce-client-app-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.0/hadoop-mapreduce-client-common-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.0/hadoop-mapreduce-client-core-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.0/hadoop-mapreduce-client-jobclient-2.6.0.jar
*
/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.0/hadoop-mapreduce-client-shuffle-2.6.0.jar
* /org/apache/hadoop/hadoop-yarn-api/2.6.0/hadoop-yarn-api-2.6.0.jar
* /org/apache/hadoop/hadoop-yarn-client/2.6.0/hadoop-yarn-client-2.6.0.jar
* /org/apache/hadoop/hadoop-yarn-common/2.6.0/hadoop-yarn-common-2.6.0.jar
*
/org/apache/hadoop/hadoop-yarn-server-common/2.6.0/hadoop-yarn-server-common-2.6.0.jar
Here's my dependency list (from WebUI) for Spark 1.1.0 with hadoop 2.6.0:
/asm/asm/3.1/asm-3.1.jar
/cheshire/cheshire/5.3.1/cheshire-5.3.1.jar
/cider/cider-nrepl/0.8.2/cider-nrepl-0.8.2.jar
/clj-logging-config/clj-logging-config/1.9.12/clj-logging-config-1.9.12.jar
/clj-time/clj-time/0.8.0/clj-time-0.8.0.jar
/cljs-tooling/cljs-tooling/0.1.3/cljs-tooling-0.1.3.jar
/colt/colt/1.2.0/colt-1.2.0.jar
/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar
/com/codahale/metrics/metrics-core/3.0.0/metrics-core-3.0.0.jar
/com/codahale/metrics/metrics-graphite/3.0.0/metrics-graphite-3.0.0.jar
/com/codahale/metrics/metrics-json/3.0.0/metrics-json-3.0.0.jar
/com/codahale/metrics/metrics-jvm/3.0.0/metrics-jvm-3.0.0.jar
/com/damballa/abracad/0.4.11/abracad-0.4.11.jar
/com/damballa/parkour/0.6.1/parkour-0.6.1.jar
/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar
/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar
/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar
/com/fasterxml/jackson/core/jackson-annotations/2.3.0/jackson-annotations-2.3.0.jar
/com/fasterxml/jackson/core/jackson-core/2.3.1/jackson-core-2.3.1.jar
/com/fasterxml/jackson/core/jackson-databind/2.3.1/jackson-databind-2.3.1.jar
/com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.3.1/jackson-dataformat-smile-2.3.1.jar
/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar
/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar
/com/google/guava/guava/14.0.1/guava-14.0.1.jar
/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar
/com/ning/compress-lzf/1.0.0/compress-lzf-1.0.0.jar
/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar
/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar
/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar
/com/twitter/carbonite/1.4.0/carbonite-1.4.0.jar
/com/twitter/chill-java/0.3.6/chill-java-0.3.6.jar
/com/twitter/chill_2.10/0.5.2/chill_2.10-0.5.2.jar
/com/typesafe/config/1.0.2/config-1.0.2.jar
/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar
/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar
/commons-cli/commons-cli/1.2/commons-cli-1.2.jar
/commons-codec/commons-codec/1.4/commons-codec-1.4.jar
/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar
/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar
/commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.jar
/commons-digester/commons-digester/1.8/commons-digester-1.8.jar
/commons-el/commons-el/1.0/commons-el-1.0.jar
/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar
/commons-io/commons-io/2.4/commons-io-2.4.jar
/commons-lang/commons-lang/2.6/commons-lang-2.6.jar
/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar
/commons-net/commons-net/2.2/commons-net-2.2.jar
/compliment/compliment/0.2.0/compliment-0.2.0.jar
/concurrent/concurrent/1.3.4/concurrent-1.3.4.jar
/gorillalabs/config/1.0.0/config-1.0.0.jar
/gorillalabs/sparkling/1.1.0/sparkling-1.1.0.jar
/io/netty/netty-all/4.0.23.Final/netty-all-4.0.23.Final.jar
/io/netty/netty/3.6.6.Final/netty-3.6.6.Final.jar
/javax/activation/activation/1.1/activation-1.1.jar
/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar
/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar
/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar
/jline/jline/0.9.94/jline-0.9.94.jar
/joda-time/joda-time/2.3/joda-time-2.3.jar
/log4j/log4j/1.2.17/log4j-1.2.17.jar
/mysql/mysql-connector-java/5.1.31/mysql-connector-java-5.1.31.jar
/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar
/net/jpountz/lz4/lz4/1.2.0/lz4-1.2.0.jar
/net/sf/py4j/py4j/0.8.2.1/py4j-0.8.2.1.jar
/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar
/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar
/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar
/org/apache/avro/avro/1.7.7/avro-1.7.7.jar
/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar
/org/apache/commons/commons-lang3/3.3.2/commons-lang3-3.3.2.jar
/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar
/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar
/org/apache/curator/curator-framework/2.4.0/curator-framework-2.4.0.jar
/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar
/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar
/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar
/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar
/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar
/org/apache/hadoop/hadoop-annotations/2.6.0/hadoop-annotations-2.6.0.jar
/org/apache/hadoop/hadoop-auth/2.6.0/hadoop-auth-2.6.0.jar
/org/apache/hadoop/hadoop-client/2.6.0/hadoop-client-2.6.0.jar
/org/apache/hadoop/hadoop-common/2.6.0/hadoop-common-2.6.0.jar
/org/apache/hadoop/hadoop-hdfs/2.6.0/hadoop-hdfs-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.0/hadoop-mapreduce-client-app-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.0/hadoop-mapreduce-client-common-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.0/hadoop-mapreduce-client-core-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.0/hadoop-mapreduce-client-jobclient-2.6.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.0/hadoop-mapreduce-client-shuffle-2.6.0.jar
/org/apache/hadoop/hadoop-yarn-api/2.6.0/hadoop-yarn-api-2.6.0.jar
/org/apache/hadoop/hadoop-yarn-client/2.6.0/hadoop-yarn-client-2.6.0.jar
/org/apache/hadoop/hadoop-yarn-common/2.6.0/hadoop-yarn-common-2.6.0.jar
/org/apache/hadoop/hadoop-yarn-server-common/2.6.0/hadoop-yarn-server-common-2.6.0.jar
/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar
/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar
/org/apache/mesos/mesos/0.18.1/mesos-0.18.1-shaded-protobuf.jar
/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.jar
/org/apache/velocity/velocity/1.7/velocity-1.7.jar
/org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.jar
/org/clojure/clojure/1.6.0/clojure-1.6.0.jar
/org/clojure/core.cache/0.6.3/core.cache-0.6.3.jar
/org/clojure/core.memoize/0.5.6/core.memoize-0.5.6.jar
/org/clojure/data.priority-map/0.0.2/data.priority-map-0.0.2.jar
/org/clojure/java.classpath/0.2.0/java.classpath-0.2.0.jar
/org/clojure/java.jdbc/0.3.5/java.jdbc-0.3.5.jar
/org/clojure/math.numeric-tower/0.0.4/math.numeric-tower-0.0.4.jar
/org/clojure/tools.cli/0.3.1/tools.cli-0.3.1.jar
/org/clojure/tools.logging/0.3.1/tools.logging-0.3.1.jar
/org/clojure/tools.namespace/0.2.5/tools.namespace-0.2.5.jar
/org/clojure/tools.nrepl/0.2.5/tools.nrepl-0.2.5.jar
/org/clojure/tools.trace/0.7.8/tools.trace-0.7.8.jar
/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar
/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar
/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar
/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar
/org/eclipse/jetty/jetty-continuation/8.1.14.v20131031/jetty-continuation-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-http/8.1.14.v20131031/jetty-http-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-io/8.1.14.v20131031/jetty-io-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-jndi/8.1.14.v20131031/jetty-jndi-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-plus/8.1.14.v20131031/jetty-plus-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-security/8.1.14.v20131031/jetty-security-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-server/8.1.14.v20131031/jetty-server-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-servlet/8.1.14.v20131031/jetty-servlet-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-util/8.1.14.v20131031/jetty-util-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-webapp/8.1.14.v20131031/jetty-webapp-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-xml/8.1.14.v20131031/jetty-xml-8.1.14.v20131031.jar
/org/eclipse/jetty/orbit/javax.activation/1.1.0.v201105071233/javax.activation-1.1.0.v201105071233.jar
/org/eclipse/jetty/orbit/javax.mail.glassfish/1.4.1.v201005082020/javax.mail.glassfish-1.4.1.v201005082020.jar
/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar
/org/eclipse/jetty/orbit/javax.transaction/1.1.1.v201105210645/javax.transaction-1.1.1.v201105210645.jar
/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar
/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar
/org/json4s/json4s-ast_2.10/3.2.10/json4s-ast_2.10-3.2.10.jar
/org/json4s/json4s-core_2.10/3.2.10/json4s-core_2.10-3.2.10.jar
/org/json4s/json4s-jackson_2.10/3.2.10/json4s-jackson_2.10-3.2.10.jar
/org/objenesis/objenesis/1.2/objenesis-1.2.jar
/org/ow2/asm/asm/4.0/asm-4.0.jar
/org/scala-lang/scala-compiler/2.10.0/scala-compiler-2.10.0.jar
/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar
/org/scala-lang/scala-reflect/2.10.0/scala-reflect-2.10.0.jar
/org/scala-lang/scalap/2.10.0/scalap-2.10.0.jar
/org/slf4j/jcl-over-slf4j/1.7.5/jcl-over-slf4j-1.7.5.jar
/org/slf4j/jul-to-slf4j/1.7.5/jul-to-slf4j-1.7.5.jar
/org/slf4j/slf4j-api/1.7.7/slf4j-api-1.7.7.jar
/org/slf4j/slf4j-log4j12/1.7.7/slf4j-log4j12-1.7.7.jar
/org/spark-project/akka/akka-actor_2.10/2.2.3-shaded-protobuf/akka-actor_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/akka/akka-slf4j_2.10/2.2.3-shaded-protobuf/akka-slf4j_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/protobuf/protobuf-java/2.4.1-shaded/protobuf-java-2.4.1-shaded.jar
/org/spark-project/pyrolite/2.0.1/pyrolite-2.0.1.jar
/org/tachyonproject/tachyon-client/0.5.0/tachyon-client-0.5.0.jar
/org/tachyonproject/tachyon/0.5.0/tachyon-0.5.0.jar
/org/tcrawley/dynapath/0.2.3/dynapath-0.2.3.jar
/org/tukaani/xz/1.0/xz-1.0.jar
/org/uncommons/maths/uncommons-maths/1.2.2a/uncommons-maths-1.2.2a.jar
/org/xerial/snappy/snappy-java/1.0.5.3/snappy-java-1.0.5.3.jar
/pjstadig/scopes/0.3.0/scopes-0.3.0.jar
/tigris/tigris/0.1.1/tigris-0.1.1.jar
/tomcat/jasper-runtime/5.5.23/jasper-runtime-5.5.23.jar
/transduce/transduce/0.1.1/transduce-0.1.1.jar
/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar
/xmlenc/xmlenc/0.52/xmlenc-0.52.jar
/spark/pacs-spark/resources
/spark/pacs-spark/src
/spark/pacs-spark/target/classes
/spark/pacs-spark/test
And here's my dependency list with Spark-1.1.0-CDH:
/cheshire/cheshire/5.3.1/cheshire-5.3.1.jar
/cider/cider-nrepl/0.8.2/cider-nrepl-0.8.2.jar
/clj-logging-config/clj-logging-config/1.9.12/clj-logging-config-1.9.12.jar
/clj-time/clj-time/0.8.0/clj-time-0.8.0.jar
/cljs-tooling/cljs-tooling/0.1.3/cljs-tooling-0.1.3.jar
/colt/colt/1.2.0/colt-1.2.0.jar
/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar
/com/codahale/metrics/metrics-core/3.0.0/metrics-core-3.0.0.jar
/com/codahale/metrics/metrics-graphite/3.0.0/metrics-graphite-3.0.0.jar
/com/codahale/metrics/metrics-json/3.0.0/metrics-json-3.0.0.jar
/com/codahale/metrics/metrics-jvm/3.0.0/metrics-jvm-3.0.0.jar
/com/damballa/abracad/0.4.11/abracad-0.4.11.jar
/com/damballa/parkour/0.6.1/parkour-0.6.1.jar
/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar
/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar
/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar
/com/fasterxml/jackson/core/jackson-annotations/2.3.0/jackson-annotations-2.3.0.jar
/com/fasterxml/jackson/core/jackson-core/2.3.1/jackson-core-2.3.1.jar
/com/fasterxml/jackson/core/jackson-databind/2.3.1/jackson-databind-2.3.1.jar
/com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.3.1/jackson-dataformat-smile-2.3.1.jar
/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar
/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar
/com/google/guava/guava/14.0.1/guava-14.0.1.jar
/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar
/com/jamesmurty/utils/java-xmlbuilder/0.4/java-xmlbuilder-0.4.jar
/com/ning/compress-lzf/1.0.0/compress-lzf-1.0.0.jar
/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar
/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar
/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar
/com/twitter/carbonite/1.4.0/carbonite-1.4.0.jar
/com/twitter/chill-java/0.3.6/chill-java-0.3.6.jar
/com/twitter/chill_2.10/0.5.2/chill_2.10-0.5.2.jar
/com/typesafe/config/1.0.2/config-1.0.2.jar
/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar
/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar
/commons-cli/commons-cli/1.2/commons-cli-1.2.jar
/commons-codec/commons-codec/1.4/commons-codec-1.4.jar
/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar
/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar
/commons-digester/commons-digester/1.8/commons-digester-1.8.jar
/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar
/commons-io/commons-io/2.4/commons-io-2.4.jar
/commons-lang/commons-lang/2.6/commons-lang-2.6.jar
/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar
/commons-net/commons-net/2.2/commons-net-2.2.jar
/compliment/compliment/0.2.0/compliment-0.2.0.jar
/concurrent/concurrent/1.3.4/concurrent-1.3.4.jar
/gorillalabs/config/1.0.0/config-1.0.0.jar
/gorillalabs/sparkling/1.1.0/sparkling-1.1.0.jar
/io/netty/netty-all/4.0.23.Final/netty-all-4.0.23.Final.jar
/io/netty/netty/3.6.6.Final/netty-3.6.6.Final.jar
/javax/activation/activation/1.1/activation-1.1.jar
/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar
/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar
/jline/jline/0.9.94/jline-0.9.94.jar
/joda-time/joda-time/2.3/joda-time-2.3.jar
/log4j/log4j/1.2.17/log4j-1.2.17.jar
/mysql/mysql-connector-java/5.1.31/mysql-connector-java-5.1.31.jar
/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar
/net/jpountz/lz4/lz4/1.2.0/lz4-1.2.0.jar
/net/sf/py4j/py4j/0.8.2.1/py4j-0.8.2.1.jar
/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar
/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar
/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar
/org/apache/avro/avro/1.7.7/avro-1.7.7.jar
/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar
/org/apache/commons/commons-lang3/3.3.2/commons-lang3-3.3.2.jar
/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar
/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar
/org/apache/curator/curator-framework/2.4.0/curator-framework-2.4.0.jar
/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar
/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar
/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar
/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar
/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar
/org/apache/hadoop/hadoop-annotations/2.5.0-cdh5.2.0/hadoop-annotations-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-auth/2.5.0-cdh5.2.0/hadoop-auth-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-client/2.5.0-cdh5.2.0/hadoop-client-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-common/2.5.0-cdh5.2.0/hadoop-common-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-hdfs/2.5.0-cdh5.2.0/hadoop-hdfs-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0-cdh5.2.0/hadoop-mapreduce-client-app-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-common/2.5.0-cdh5.2.0/hadoop-mapreduce-client-common-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-core/2.5.0-cdh5.2.0/hadoop-mapreduce-client-core-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.5.0-cdh5.2.0/hadoop-mapreduce-client-jobclient-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.5.0-cdh5.2.0/hadoop-mapreduce-client-shuffle-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-yarn-api/2.5.0-cdh5.2.0/hadoop-yarn-api-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-yarn-client/2.5.0-cdh5.2.0/hadoop-yarn-client-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-yarn-common/2.5.0-cdh5.2.0/hadoop-yarn-common-2.5.0-cdh5.2.0.jar
/org/apache/hadoop/hadoop-yarn-server-common/2.5.0-cdh5.2.0/hadoop-yarn-server-common-2.5.0-cdh5.2.0.jar
/org/apache/httpcomponents/httpclient/4.1.2/httpclient-4.1.2.jar
/org/apache/httpcomponents/httpcore/4.1.2/httpcore-4.1.2.jar
/org/apache/mesos/mesos/0.18.1/mesos-0.18.1-shaded-protobuf.jar
/org/apache/spark/spark-core_2.10/1.1.0-cdh5.2.0/spark-core_2.10-1.1.0-cdh5.2.0.jar
/org/apache/velocity/velocity/1.7/velocity-1.7.jar
/org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.jar
/org/clojure/clojure/1.6.0/clojure-1.6.0.jar
/org/clojure/core.cache/0.6.3/core.cache-0.6.3.jar
/org/clojure/core.memoize/0.5.6/core.memoize-0.5.6.jar
/org/clojure/data.priority-map/0.0.2/data.priority-map-0.0.2.jar
/org/clojure/java.classpath/0.2.0/java.classpath-0.2.0.jar
/org/clojure/java.jdbc/0.3.5/java.jdbc-0.3.5.jar
/org/clojure/math.numeric-tower/0.0.4/math.numeric-tower-0.0.4.jar
/org/clojure/tools.cli/0.3.1/tools.cli-0.3.1.jar
/org/clojure/tools.logging/0.3.1/tools.logging-0.3.1.jar
/org/clojure/tools.namespace/0.2.5/tools.namespace-0.2.5.jar
/org/clojure/tools.nrepl/0.2.5/tools.nrepl-0.2.5.jar
/org/clojure/tools.trace/0.7.8/tools.trace-0.7.8.jar
/org/codehaus/jackson/jackson-core-asl/1.8.8/jackson-core-asl-1.8.8.jar
/org/codehaus/jackson/jackson-jaxrs/1.8.8/jackson-jaxrs-1.8.8.jar
/org/codehaus/jackson/jackson-mapper-asl/1.8.8/jackson-mapper-asl-1.8.8.jar
/org/codehaus/jackson/jackson-xc/1.8.8/jackson-xc-1.8.8.jar
/org/eclipse/jetty/jetty-continuation/8.1.14.v20131031/jetty-continuation-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-http/8.1.14.v20131031/jetty-http-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-io/8.1.14.v20131031/jetty-io-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-jndi/8.1.14.v20131031/jetty-jndi-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-plus/8.1.14.v20131031/jetty-plus-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-security/8.1.14.v20131031/jetty-security-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-server/8.1.14.v20131031/jetty-server-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-servlet/8.1.14.v20131031/jetty-servlet-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-util/8.1.14.v20131031/jetty-util-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-webapp/8.1.14.v20131031/jetty-webapp-8.1.14.v20131031.jar
/org/eclipse/jetty/jetty-xml/8.1.14.v20131031/jetty-xml-8.1.14.v20131031.jar
/org/eclipse/jetty/orbit/javax.activation/1.1.0.v201105071233/javax.activation-1.1.0.v201105071233.jar
/org/eclipse/jetty/orbit/javax.mail.glassfish/1.4.1.v201005082020/javax.mail.glassfish-1.4.1.v201005082020.jar
/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar
/org/eclipse/jetty/orbit/javax.transaction/1.1.1.v201105210645/javax.transaction-1.1.1.v201105210645.jar
/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar
/org/json4s/json4s-ast_2.10/3.2.10/json4s-ast_2.10-3.2.10.jar
/org/json4s/json4s-core_2.10/3.2.10/json4s-core_2.10-3.2.10.jar
/org/json4s/json4s-jackson_2.10/3.2.10/json4s-jackson_2.10-3.2.10.jar
/org/objenesis/objenesis/1.2/objenesis-1.2.jar
/org/ow2/asm/asm/4.0/asm-4.0.jar
/org/scala-lang/scala-compiler/2.10.0/scala-compiler-2.10.0.jar
/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar
/org/scala-lang/scala-reflect/2.10.0/scala-reflect-2.10.0.jar
/org/scala-lang/scalap/2.10.0/scalap-2.10.0.jar
/org/slf4j/jcl-over-slf4j/1.7.5/jcl-over-slf4j-1.7.5.jar
/org/slf4j/jul-to-slf4j/1.7.5/jul-to-slf4j-1.7.5.jar
/org/slf4j/slf4j-api/1.7.7/slf4j-api-1.7.7.jar
/org/slf4j/slf4j-log4j12/1.7.7/slf4j-log4j12-1.7.7.jar
/org/spark-project/akka/akka-actor_2.10/2.2.3-shaded-protobuf/akka-actor_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/akka/akka-slf4j_2.10/2.2.3-shaded-protobuf/akka-slf4j_2.10-2.2.3-shaded-protobuf.jar
/org/spark-project/protobuf/protobuf-java/2.4.1-shaded/protobuf-java-2.4.1-shaded.jar
/org/spark-project/pyrolite/2.0.1/pyrolite-2.0.1.jar
/org/tachyonproject/tachyon-client/0.5.0/tachyon-client-0.5.0.jar
/org/tachyonproject/tachyon/0.5.0/tachyon-0.5.0.jar
/org/tcrawley/dynapath/0.2.3/dynapath-0.2.3.jar
/org/tukaani/xz/1.0/xz-1.0.jar
/org/uncommons/maths/uncommons-maths/1.2.2a/uncommons-maths-1.2.2a.jar
/org/xerial/snappy/snappy-java/1.0.5.3/snappy-java-1.0.5.3.jar
/pjstadig/scopes/0.3.0/scopes-0.3.0.jar
/tigris/tigris/0.1.1/tigris-0.1.1.jar
/transduce/transduce/0.1.1/transduce-0.1.1.jar
/xmlenc/xmlenc/0.52/xmlenc-0.52.jar
/spark/pacs-spark/resources
/spark/pacs-spark/src
/spark/pacs-spark/target/classes
/spark/pacs-spark/test
> Shuffle write increases
> -----------------------
>
> Key: SPARK-5081
> URL: https://issues.apache.org/jira/browse/SPARK-5081
> Project: Spark
> Issue Type: Bug
> Components: Shuffle
> Affects Versions: 1.2.0
> Reporter: Kevin Jung
> Priority: Critical
> Attachments: Spark_Debug.pdf
>
>
> The size of shuffle write showing in spark web UI is much different when I
> execute same spark job with same input data in both spark 1.1 and spark 1.2.
> At sortBy stage, the size of shuffle write is 98.1MB in spark 1.1 but 146.9MB
> in spark 1.2.
> I set spark.shuffle.manager option to hash because it's default value is
> changed but spark 1.2 still writes shuffle output more than spark 1.1.
> It can increase disk I/O overhead exponentially as the input file gets bigger
> and it causes the jobs take more time to complete.
> In the case of about 100GB input, for example, the size of shuffle write is
> 39.7GB in spark 1.1 but 91.0GB in spark 1.2.
> spark 1.1
> ||Stage Id||Description||Input||Shuffle Read||Shuffle Write||
> |9|saveAsTextFile| |1169.4KB| |
> |12|combineByKey| |1265.4KB|1275.0KB|
> |6|sortByKey| |1276.5KB| |
> |8|mapPartitions| |91.0MB|1383.1KB|
> |4|apply| |89.4MB| |
> |5|sortBy|155.6MB| |98.1MB|
> |3|sortBy|155.6MB| | |
> |1|collect| |2.1MB| |
> |2|mapValues|155.6MB| |2.2MB|
> |0|first|184.4KB| | |
> spark 1.2
> ||Stage Id||Description||Input||Shuffle Read||Shuffle Write||
> |12|saveAsTextFile| |1170.2KB| |
> |11|combineByKey| |1264.5KB|1275.0KB|
> |8|sortByKey| |1273.6KB| |
> |7|mapPartitions| |134.5MB|1383.1KB|
> |5|zipWithIndex| |132.5MB| |
> |4|sortBy|155.6MB| |146.9MB|
> |3|sortBy|155.6MB| | |
> |2|collect| |2.0MB| |
> |1|mapValues|155.6MB| |2.2MB|
> |0|first|184.4KB| | |
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]