[ https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058980#comment-15058980 ]
Gopal V commented on HIVE-12683: -------------------------------- >From your blog - are you using 59Gb Tez containers? set hive.tez.container.size=59205; > Does Tez run slower than hive on larger dataset (~2.5 TB)? > ---------------------------------------------------------- > > Key: HIVE-12683 > URL: https://issues.apache.org/jira/browse/HIVE-12683 > Project: Hive > Issue Type: Bug > Reporter: rohit garg > > We have started to look into testing tez query engine. From initial results, > we are getting 30% performance boost over Hive on smaller data set(1-10 GB) > but Hive starts to perform better than Tez as data size increases. Like when > we run a hive query with Tez on about 2.3 TB worth of data, it performs worse > than hive alone.(~20% less performance) Details are in the post below. > On a cluster with 1.3 TB RAM, I set the following property : > set tez.task.resource.memory.mb=10000; set tez.am.resource.memory.mb=59205; > set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; > set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=36700160000; > Is it normal or I am missing some property / not configuring some property > properly? Also, I am using an older version of Tez as of now. Could that be > the issue too? I still have to bootstrap latest version of Tez on EMR and > test it and see if that could do any better. > Thought of asking here too > http://www.jwplayer.com/blog/hive-with-tez-on-emr/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)