[ 
https://issues.apache.org/jira/browse/HIVE-17848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17848:
----------------------------------
    Attachment: HIVE-17848.4.patch

> Bucket Map Join : Implement an efficient way to minimize loading hash table
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-17848
>                 URL: https://issues.apache.org/jira/browse/HIVE-17848
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Deepak Jaiswal
>            Assignee: Deepak Jaiswal
>            Priority: Major
>         Attachments: HIVE-17848.2.patch, HIVE-17848.4.patch
>
>
> In bucket mapjoin, each task loads its own copy of hash table which is 
> inefficient as load is IO heavy and due to multiple copies of same hash 
> table, the tables may get GCed on a busy system.
> Implement a subcache with softreference to each hash table corresponding to 
> its bucketID such that it can be reused by a task.
> This needs changes from Tez side to push bucket id to TezProcessor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to