[jira] [Created] (TEZ-3002) Does Tez run slower than hive on larger dataset (~2.5 TB)?

rohit garg (JIRA) Tue, 15 Dec 2015 13:40:54 -0800

rohit garg created TEZ-3002:
-------------------------------

             Summary: Does Tez run slower than hive on larger dataset (~2.5 TB)?
                 Key: TEZ-3002
                 URL: https://issues.apache.org/jira/browse/TEZ-3002
             Project: Apache Tez
          Issue Type: Bug
            Reporter: rohit garg



We have started to look into testing tez query engine. From initial results, we 
are getting 30% performance boost over Hive on smaller data set(1-10 GB) but 
Hive starts to perform better than Tez as data size increases. Like when we run 
a hive query with Tez on about 2.3 TB worth of data, it performs worse than 
hive alone.(~20% less performance) Details are in the post below.

On a cluster with 1.3 TB RAM, I set the following property :

set tez.task.resource.memory.mb=10000; set tez.am.resource.memory.mb=59205; set 
tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; set 
hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=36700160000;

Is it normal or I am missing some property / not configuring some property 
properly? Also, I am using an older version of Tez as of now. Could that be the 
issue too? I still to bootstrap latest version of Tez on EMR and test it and 
see if that could do any better

http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-3002) Does Tez run slower than hive on larger dataset (~2.5 TB)?

Reply via email to