[
https://issues.apache.org/jira/browse/SPARK-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-16022.
-------------------------------
Resolution: Not A Problem
This should be a question at user@ first.
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> Input size is different when I use 1 or 3 nodes but the shufle size remains
> +- icual, do you know why?
> ------------------------------------------------------------------------------------------------------
>
> Key: SPARK-16022
> URL: https://issues.apache.org/jira/browse/SPARK-16022
> Project: Spark
> Issue Type: Test
> Reporter: jon
>
> I run some queries on spark with just one node and then with 3 nodes. And in
> the spark:4040 UI I see something that I am not understanding.
> For example after executing a query with 3 nodes and check the results in the
> spark UI, in the "input" tab appears 2,8gb, so spark read 2,8gb from hadoop.
> The same query on hadoop with just one node in local mode appears 7,3gb, the
> spark read 7,3GB from hadoop. But this value shouldnt be equal?
> For example the value of shuffle remains +- equal in one node vs 3. Why the
> input value doesn't stay equal? The same amount of data must be read from the
> hdfs, so I am not understanding.
> Do you know?
> Single node:
> Input: 7,3 GB
> Shuffle read: 208.1kb
> Shuffle write: 208.1kb
> 3 nodes:
> Input: 2,8 GB
> Shuffle read: 193,3 kb
> Shuffle write; 208.1 kb
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]