Johannes Zillmann created TEZ-2148:
--------------------------------------
Summary: Slow container grabbing with Capacity Scheduler in
comparision to MapReduce
Key: TEZ-2148
URL: https://issues.apache.org/jira/browse/TEZ-2148
Project: Apache Tez
Issue Type: Task
Affects Versions: 0.5.1
Reporter: Johannes Zillmann
A customer experienced the following:
- Setup a CapacityScheduler for user 'company'
- Same processing job on same data is faster with MapReduce then with Tez with
"normal" cluster business. Only if nothing else runs on Hadoop then Tez
outperforms MapReduce. (Its hard to give exact data here since we get every
information second hand from the customer, but the timings were pretty stable
over a dozen of runs. The MapReduce job in about 70 sec and Tez in about 170
sec.)
So questions is, is there some difference in how Tez is grabbing resources from
the capacity scheduler in difference to MapReduce ?
Looking at the logs it looks like Tez is always very slow in starting the
containers where as MapReduce parallelizes very quickly.
Attached client and application logs for Tez and MapReduce run as well as the
scheduler configuration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)