[ 
https://issues.apache.org/jira/browse/SPARK-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337615#comment-15337615
 ] 

Sean Owen commented on SPARK-16022:
-----------------------------------

The u...@spark.apache.org mailing list http://spark.apache.org/community.html

> Input size is different when I use 1 or 3 nodes but the shufle size remains 
> +- icual, do you know why?
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16022
>                 URL: https://issues.apache.org/jira/browse/SPARK-16022
>             Project: Spark
>          Issue Type: Test
>            Reporter: jon
>
> I run some queries on spark with just one node and then with 3 nodes. And in 
> the spark:4040 UI I see something that I am not understanding.
> For example after executing a query with 3 nodes and check the results in the 
> spark UI, in the "input" tab appears 2,8gb, so spark read 2,8gb from hadoop. 
> The same query on hadoop with just one node in local mode appears 7,3gb, the 
> spark read 7,3GB from hadoop. But this value shouldnt be equal?
> For example the value of shuffle remains +- equal in one node vs 3. Why the 
> input value doesn't stay equal? The same amount of data must be read from the 
> hdfs, so I am not understanding.
> Do you know?
> Single node:
> Input: 7,3 GB
> Shuffle read: 208.1kb
> Shuffle write: 208.1kb
> 3 nodes:
> Input: 2,8 GB
> Shuffle read: 193,3 kb
> Shuffle write; 208.1 kb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to