jon created SPARK-16022:
---------------------------
Summary: Input size is different when I use 1 or 3 nodes but the
shufle size remains +- icual, do you know why?
Key: SPARK-16022
URL: https://issues.apache.org/jira/browse/SPARK-16022
Project: Spark
Issue Type: Test
Reporter: jon
I run some queries on spark with just one node and then with 3 nodes. And in
the spark:4040 UI I see something that I am not understanding.
For example after executing a query with 3 nodes and check the results in the
spark UI, in the "input" tab appears 2,8gb, so spark read 2,8gb from hadoop.
The same query on hadoop with just one node in local mode appears 7,3gb, the
spark read 7,3GB from hadoop. But this value shouldnt be equal?
For example the value of shuffle remains +- equal in one node vs 3. Why the
input value doesn't stay equal? The same amount of data must be read from the
hdfs, so I am not understanding.
Do you know?
Single node:
Input: 7,3 GB
Shuffle read: 208.1kb
Shuffle write: 208.1kb
3 nodes:
Input: 2,8 GB
Shuffle read: 193,3 kb
Shuffle write; 208.1 kb
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]