[
https://issues.apache.org/jira/browse/HIVE-17848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Deepak Jaiswal updated HIVE-17848:
----------------------------------
Attachment: HIVE-17848.4.patch
> Bucket Map Join : Implement an efficient way to minimize loading hash table
> ---------------------------------------------------------------------------
>
> Key: HIVE-17848
> URL: https://issues.apache.org/jira/browse/HIVE-17848
> Project: Hive
> Issue Type: Bug
> Reporter: Deepak Jaiswal
> Assignee: Deepak Jaiswal
> Priority: Major
> Attachments: HIVE-17848.2.patch, HIVE-17848.4.patch
>
>
> In bucket mapjoin, each task loads its own copy of hash table which is
> inefficient as load is IO heavy and due to multiple copies of same hash
> table, the tables may get GCed on a busy system.
> Implement a subcache with softreference to each hash table corresponding to
> its bucketID such that it can be reused by a task.
> This needs changes from Tez side to push bucket id to TezProcessor.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)