[ https://issues.apache.org/jira/browse/HIVE-17848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deepak Jaiswal updated HIVE-17848: ---------------------------------- Attachment: HIVE-17848.4.patch > Bucket Map Join : Implement an efficient way to minimize loading hash table > --------------------------------------------------------------------------- > > Key: HIVE-17848 > URL: https://issues.apache.org/jira/browse/HIVE-17848 > Project: Hive > Issue Type: Bug > Reporter: Deepak Jaiswal > Assignee: Deepak Jaiswal > Priority: Major > Attachments: HIVE-17848.2.patch, HIVE-17848.4.patch > > > In bucket mapjoin, each task loads its own copy of hash table which is > inefficient as load is IO heavy and due to multiple copies of same hash > table, the tables may get GCed on a busy system. > Implement a subcache with softreference to each hash table corresponding to > its bucketID such that it can be reused by a task. > This needs changes from Tez side to push bucket id to TezProcessor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)