[
https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841714#action_12841714
]
Zheng Shao commented on HIVE-224:
---------------------------------
Hi James, currently we don't have the bandwidth to do this, but I guess it
won't be too hard - we just need to use
http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashMap.html (search
for LRU).
Are you interested in joining force on this?
> implement lfu based flushing policy for map side aggregates
> -----------------------------------------------------------
>
> Key: HIVE-224
> URL: https://issues.apache.org/jira/browse/HIVE-224
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Joydeep Sen Sarma
>
> currently we flush some random set of rows when the map side hash table
> approaches memory limits.
> we have discussed a strategy of flushing hash table entries that have the
> been seen the least number of times (effectively LFU flushing strategy). This
> will be very effective at reducing the amount of data sent from map to reduce
> step - as well as reduce the chances for any skews.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.