[ https://issues.apache.org/jira/browse/SPARK-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337615#comment-15337615 ]
Sean Owen commented on SPARK-16022: ----------------------------------- The u...@spark.apache.org mailing list http://spark.apache.org/community.html > Input size is different when I use 1 or 3 nodes but the shufle size remains > +- icual, do you know why? > ------------------------------------------------------------------------------------------------------ > > Key: SPARK-16022 > URL: https://issues.apache.org/jira/browse/SPARK-16022 > Project: Spark > Issue Type: Test > Reporter: jon > > I run some queries on spark with just one node and then with 3 nodes. And in > the spark:4040 UI I see something that I am not understanding. > For example after executing a query with 3 nodes and check the results in the > spark UI, in the "input" tab appears 2,8gb, so spark read 2,8gb from hadoop. > The same query on hadoop with just one node in local mode appears 7,3gb, the > spark read 7,3GB from hadoop. But this value shouldnt be equal? > For example the value of shuffle remains +- equal in one node vs 3. Why the > input value doesn't stay equal? The same amount of data must be read from the > hdfs, so I am not understanding. > Do you know? > Single node: > Input: 7,3 GB > Shuffle read: 208.1kb > Shuffle write: 208.1kb > 3 nodes: > Input: 2,8 GB > Shuffle read: 193,3 kb > Shuffle write; 208.1 kb -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org