[
https://issues.apache.org/jira/browse/TEZ-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bikas Saha updated TEZ-978:
---------------------------
Comment: was deleted
(was: looks good. committing this.)
> Enhance auto parallelism tuning for queries having empty outputs or data
> skewness
> ---------------------------------------------------------------------------------
>
> Key: TEZ-978
> URL: https://issues.apache.org/jira/browse/TEZ-978
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-978-v1.patch, TEZ-978-v2.patch, TEZ-978.3.patch,
> TEZ-978.4.patch, TEZ-978.4.wip.patch, TEZ-978.5.patch, TEZ-978.6.patch
>
>
> Running tpcds (query-92) with auto-tuning
> "tez.am.shuffle-vertex-manager.enable.auto-parallel" degraded the performance
> than original run.
> Query has lots of empty outputs and these tasks tend to complete a lot more
> faster than others. Tez computes the parallelism with the given information
> (wherein most of the output is empty) and set the reducers to "1". When
> other tasks complete, single reducer has to do the heavy lifting and this
> causes the performance degradation.
> Map 1: 2/181 Map 5: 16/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 22/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 25/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 30/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 35/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 36/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 39/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 3/181 Map 5: 43/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 5/181 Map 5: 46/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1 <===
> ShuffleVertexManager changing parallelism
> Map 1: 5/181 Map 5: 63/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 7/181 Map 5: 72/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 7/181 Map 5: 83/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 8/181 Map 5: 95/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 8/181 Map 5: 104/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 9/181 Map 5: 116/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 12/181 Map 5: 123/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 13/181 Map 5: 127/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 16/181 Map 5: 127/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 17/181 Map 5: 128/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 18/181 Map 5: 131/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 19/181 Map 5: 131/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 25/181 Map 5: 132/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 33/181 Map 5: 132/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 42/181 Map 5: 134/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> <=== ShuffleVertexManager changing parallelism
> Map 1: 51/181 Map 5: 135/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/1 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 58/181 Map 5: 136/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/1 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 63/181 Map 5: 136/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/1 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 70/181 Map 5: 136/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
> 0/1 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Suggestion is to include
> 1. Empty output information when computing auto-parallelism.
> 2. Have a configurable value for determining the average output from the
> source (e.g minimum of 1 MB output from each source). If the average task
> output size does not meet this criteria (which means all the completed tasks
> are small tasks), we can defer the computation of auto-parallelism until
> other tasks are completed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)