[ https://issues.apache.org/jira/browse/HIVE-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Hammerbacher updated HIVE-135: ----------------------------------- Component/s: Query Processor Adding to "Query Processor" component. > need more accurate way of tracking memory consumption on map side aggregates > ---------------------------------------------------------------------------- > > Key: HIVE-135 > URL: https://issues.apache.org/jira/browse/HIVE-135 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Reporter: Joydeep Sen Sarma > > from email thread: > Just trying it out - I am confused by one thing: > > hive> set hive.map.aggr=true; > set hive.map.aggr=true; > hive> explain from mytable u insert overwrite directory > '/user/jssarma/tmp_agg' select u.a, avg(size(u.b)) group by u.a; > everything looks good. Now I submit this query and this is what I see on the > tracker: > Map input records 87,912,961 0 87,912,961 > Map output records 87,912,960 0 87,912,960 > This doesn't make sense. With map-side aggregates - we should be getting > vastly reduced number of rows emitted from mapper. > I am wondering whether we should rethink our flushing logic. The freeMemory() > call is not reliable (since it doesn't account for stuff that's not cleaned > out by GC). Perhaps we should switch to an explicit setting for amount of > memory for hash tables (we do know the size of each hash table entry and > overall size and should be able to guess reasonably). From what Dhruba > reported - there's no way to call the garbage collector and wait for it to > complete (to get a more accurate report of free memory). so the whole route > of obtaining free memory seems a little hosed. > by way of comparison - hadoop also estimates memory usage in sorting. there - > the sort run is just stored in a sequential stream and it just takes the size > of the stream and compares it to max allowed sort memory usage (which is a > configuration option) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.