[
https://issues.apache.org/jira/browse/TEZ-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Johannes Zillmann updated TEZ-2148:
-----------------------------------
Attachment: capacity-scheduler.xml
applicationLogs.zip
client-tez.log
client-mapreduce.log
> Slow container grabbing with Capacity Scheduler in comparision to MapReduce
> ---------------------------------------------------------------------------
>
> Key: TEZ-2148
> URL: https://issues.apache.org/jira/browse/TEZ-2148
> Project: Apache Tez
> Issue Type: Task
> Affects Versions: 0.5.1
> Reporter: Johannes Zillmann
> Attachments: applicationLogs.zip, capacity-scheduler.xml,
> client-mapreduce.log, client-tez.log
>
>
> A customer experienced the following:
> - Setup a CapacityScheduler for user 'company'
> - Same processing job on same data is faster with MapReduce then with Tez
> with "normal" cluster business. Only if nothing else runs on Hadoop then Tez
> outperforms MapReduce. (Its hard to give exact data here since we get every
> information second hand from the customer, but the timings were pretty stable
> over a dozen of runs. The MapReduce job in about 70 sec and Tez in about 170
> sec.)
> So questions is, is there some difference in how Tez is grabbing resources
> from the capacity scheduler in difference to MapReduce ?
> Looking at the logs it looks like Tez is always very slow in starting the
> containers where as MapReduce parallelizes very quickly.
> Attached client and application logs for Tez and MapReduce run as well as the
> scheduler configuration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)