[ https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843878#action_12843878 ]
James Warren commented on HIVE-224: ----------------------------------- Unfortunately have bandwidth limitations myself -- but when (if?) my queue clears I'll be happy to give it a go. cheers, -James > implement lfu based flushing policy for map side aggregates > ----------------------------------------------------------- > > Key: HIVE-224 > URL: https://issues.apache.org/jira/browse/HIVE-224 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Joydeep Sen Sarma > > currently we flush some random set of rows when the map side hash table > approaches memory limits. > we have discussed a strategy of flushing hash table entries that have the > been seen the least number of times (effectively LFU flushing strategy). This > will be very effective at reducing the amount of data sent from map to reduce > step - as well as reduce the chances for any skews. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.