[jira] [Comment Edited] (SPARK-5081) Shuffle write increases

Dr. Christian Betz (JIRA) Tue, 17 Feb 2015 04:59:11 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324154#comment-14324154
 ]


Dr. Christian Betz edited comment on SPARK-5081 at 2/17/15 12:58 PM:
---------------------------------------------------------------------

That's Spark 1.1.0, Hadoop 2.5.0 in addition to the attached document 
[^Spark_Debug.pdf]:

It logs a lot of spilling from 
org.apache.spark.util.collection.ExternalAppendOnlyMap.
Performance of Tasks in Stage 10 is low. (Minutes, not 10s of seconds)
No Shuffle Spill according to WebUI:

*Details for Stage 10*
Total task time across all tasks: 0 ms

*Summary Metrics for 3 Completed Tasks*

||Metric ||     Min||   25th percentile||       Median  ||75th percentile||     
Max||
|Result serialization time|     0 ms|   0 ms|   0 ms|   0 ms|   0 ms|
|Duration       |4,8 min|       4,8 min|        5,0 min|        5,0 min|        
5,0 min|
|Time spent fetching task results|      0 ms|   0 ms|   0 ms|   0 ms|   0 ms|
|Scheduler delay|       33 ms|  33 ms|  34 ms|  45 ms|  45 ms|

*Aggregated Metrics by Executor*

||Executor ID|| Address ||Task Time||   Total Tasks||   Failed Tasks||  
Succeeded Tasks||       Input   ||Shuffle Read||        Shuffle Write|| Shuffle 
Spill (Memory)||        Shuffle Spill (Disk)||
|localhost|     CANNOT FIND ADDRESS|    15 min| 3|      0|      3|      0.0 B|  
0.0 B|  0.0 B|  0.0 B|  0.0 B|

*Tasks*

||Index ||ID||  Attempt ||Status||      Locality Level||        Executor||      
Launch Time||   Duration||      GC Time||       Accumulators||  Errors||
|0|     291|    0|      SUCCESS|        ANY|    localhost|      2015/02/17 
13:48:39|    4,8 min|        35 s|           
|1|     292|    0|      SUCCESS|        ANY|    localhost|      2015/02/17 
13:48:39|    5,0 min|        35 s|           
|2|     293|    0|      SUCCESS|        ANY|    localhost|      2015/02/17 
13:48:39|    5,0 min|        35 s|           





was (Author: cbbetz):
That's Spark 1.1.0, Hadoop 2.5.0 in addition to the attached document 
[^Spark_Debug.pdf]:

It logs a lot of spilling from 
org.apache.spark.util.collection.ExternalAppendOnlyMap.
Performance of Tasks in Stage 10 is low. (Minutes, not 10s of seconds)
No Shuffle Spill according to WebUI:

*Details for Stage 10*
Total task time across all tasks: 0 ms

*Summary Metrics for 3 Completed Tasks*

||Metric ||     Min||   25th percentile||       Median  ||75th percentile||     
Max||
|Result serialization time|     0 ms|   0 ms|   0 ms|   0 ms|   0 ms|
|Duration       |4,8 min|       4,8 min|        5,0 min|        5,0 min|        
5,0 min|
|Time spent fetching task results|      0 ms|   0 ms|   0 ms|   0 ms|   0 ms|
|Scheduler delay|       33 ms|  33 ms|  34 ms|  45 ms|  45 ms|

*Aggregated Metrics by Executor*

||Executor ID|| Address ||Task Time||   Total Tasks||   Failed Tasks||  
Succeeded Tasks||       Input   ||Shuffle Read||        Shuffle Write|| Shuffle 
Spill (Memory)||        Shuffle Spill (Disk)||
|localhost|     CANNOT FIND ADDRESS|    15 min| 3|      0|      3|      0.0 B|  
0.0 B|  0.0 B|  0.0 B|  0.0 B|

*Tasks*

||Index ||ID||  Attempt ||Status||      Locality Level||        Executor||      
Launch Time||   Duration||      GC Time||       Accumulators||  Errors||
|0|     291|    0|      SUCCESS|        ANY|    localhost|      2015/02/17 
13:48:39|    4,8 min|        35 s|           
|1|     292|    0|      SUCCESS|        ANY|    localhost|      2015/02/17 
13:48:39|    5,0 min|        35 s|           
|2|     |293|   0|      SUCCESS|        ANY|    localhost|      2015/02/17 
13:48:39|    5,0 min|        35 s|           




> Shuffle write increases
> -----------------------
>
>                 Key: SPARK-5081
>                 URL: https://issues.apache.org/jira/browse/SPARK-5081
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 1.2.0
>            Reporter: Kevin Jung
>            Priority: Critical
>         Attachments: Spark_Debug.pdf
>
>
> The size of shuffle write showing in spark web UI is much different when I 
> execute same spark job with same input data in both spark 1.1 and spark 1.2. 
> At sortBy stage, the size of shuffle write is 98.1MB in spark 1.1 but 146.9MB 
> in spark 1.2. 
> I set spark.shuffle.manager option to hash because it's default value is 
> changed but spark 1.2 still writes shuffle output more than spark 1.1.
> It can increase disk I/O overhead exponentially as the input file gets bigger 
> and it causes the jobs take more time to complete. 
> In the case of about 100GB input, for example, the size of shuffle write is 
> 39.7GB in spark 1.1 but 91.0GB in spark 1.2.
> spark 1.1
> ||Stage Id||Description||Input||Shuffle Read||Shuffle Write||
> |9|saveAsTextFile| |1169.4KB| |
> |12|combineByKey| |1265.4KB|1275.0KB|
> |6|sortByKey| |1276.5KB| |
> |8|mapPartitions| |91.0MB|1383.1KB|
> |4|apply| |89.4MB| |
> |5|sortBy|155.6MB| |98.1MB|
> |3|sortBy|155.6MB| | |
> |1|collect| |2.1MB| |
> |2|mapValues|155.6MB| |2.2MB|
> |0|first|184.4KB| | |
> spark 1.2
> ||Stage Id||Description||Input||Shuffle Read||Shuffle Write||
> |12|saveAsTextFile| |1170.2KB| |
> |11|combineByKey| |1264.5KB|1275.0KB|
> |8|sortByKey| |1273.6KB| |
> |7|mapPartitions| |134.5MB|1383.1KB|
> |5|zipWithIndex| |132.5MB| |
> |4|sortBy|155.6MB| |146.9MB|
> |3|sortBy|155.6MB| | |
> |2|collect| |2.0MB| |
> |1|mapValues|155.6MB| |2.2MB|
> |0|first|184.4KB| | |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-5081) Shuffle write increases

Reply via email to