[ 
https://issues.apache.org/jira/browse/TEZ-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081648#comment-14081648
 ] 

Hitesh Shah commented on TEZ-978:
---------------------------------

[~bikassaha], mind taking a look?

> Enhance auto parallelism tuning for queries having empty outputs or data 
> skewness
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-978
>                 URL: https://issues.apache.org/jira/browse/TEZ-978
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-978-v1.patch, TEZ-978-v2.patch
>
>
> Running tpcds (query-92) with auto-tuning 
> "tez.am.shuffle-vertex-manager.enable.auto-parallel" degraded the performance 
> than original run.  
> Query has lots of empty outputs and these tasks tend to complete a lot more 
> faster than others.  Tez computes the parallelism with the given information 
> (wherein most of the output is empty) and set the reducers to "1".  When 
> other tasks complete, single reducer has to do the heavy lifting and this 
> causes the performance degradation.
> Map 1: 2/181    Map 5: 16/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 22/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 25/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 30/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 35/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 36/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 39/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 3/181    Map 5: 43/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 5/181    Map 5: 46/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1   <=== 
> ShuffleVertexManager changing parallelism 
> Map 1: 5/181    Map 5: 63/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 7/181    Map 5: 72/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 7/181    Map 5: 83/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 8/181    Map 5: 95/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 8/181    Map 5: 104/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 9/181    Map 5: 116/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 12/181   Map 5: 123/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 13/181   Map 5: 127/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 16/181   Map 5: 127/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 17/181   Map 5: 128/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 18/181   Map 5: 131/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 19/181   Map 5: 131/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 25/181   Map 5: 132/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 33/181   Map 5: 132/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 42/181   Map 5: 134/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1         
> <=== ShuffleVertexManager changing parallelism 
> Map 1: 51/181   Map 5: 135/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/1  Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 58/181   Map 5: 136/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/1  Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 63/181   Map 5: 136/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/1  Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 70/181   Map 5: 136/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 
> 0/1  Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Suggestion is to include 
> 1. Empty output information when computing auto-parallelism.  
> 2. Have a configurable value for determining the average output from the 
> source (e.g minimum of 1 MB output from each source).  If the average task 
> output size does not meet this criteria (which means all the completed tasks 
> are small tasks), we can defer the computation of auto-parallelism until 
> other tasks are completed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to