[ 
https://issues.apache.org/jira/browse/TEZ-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058961#comment-15058961
 ] 

Hitesh Shah commented on TEZ-3002:
----------------------------------

Moving this jira to Hive for now. It will be good to start the discussion there 
as this pertains to Hive regardless of whether it uses MR or Tez as its 
internal execution engine. 

FWIW, issues such as this are usually better off being raised on the Hive user 
mailing list for discussion/analysis and a bug created later based on the 
findings. 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> ----------------------------------------------------------
>
>                 Key: TEZ-3002
>                 URL: https://issues.apache.org/jira/browse/TEZ-3002
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=10000; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=36700160000;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to