[jira] [Created] (SPARK-16022) Input size is different when I use 1 or 3 nodes but the shufle size remains +- icual, do you know why?

jon (JIRA) Fri, 17 Jun 2016 12:27:47 -0700

jon created SPARK-16022:
---------------------------

             Summary: Input size is different when I use 1 or 3 nodes but the 
shufle size remains +- icual, do you know why?
                 Key: SPARK-16022
                 URL: https://issues.apache.org/jira/browse/SPARK-16022
             Project: Spark
          Issue Type: Test
            Reporter: jon



I run some queries on spark with just one node and then with 3 nodes. And in 
the spark:4040 UI I see something that I am not understanding.

For example after executing a query with 3 nodes and check the results in the 
spark UI, in the "input" tab appears 2,8gb, so spark read 2,8gb from hadoop. 
The same query on hadoop with just one node in local mode appears 7,3gb, the 
spark read 7,3GB from hadoop. But this value shouldnt be equal?

For example the value of shuffle remains +- equal in one node vs 3. Why the 
input value doesn't stay equal? The same amount of data must be read from the 
hdfs, so I am not understanding.

Do you know?

Single node:

Input: 7,3 GB
Shuffle read: 208.1kb
Shuffle write: 208.1kb

3 nodes:

Input: 2,8 GB
Shuffle read: 193,3 kb
Shuffle write; 208.1 kb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-16022) Input size is different when I use 1 or 3 nodes but the shufle size remains +- icual, do you know why?

Reply via email to