[jira] Commented: (HIVE-1700) Optimiza JDBM to make mapjoin faster

Alex Boisvert (JIRA) Wed, 01 Dec 2010 13:48:36 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965859#action_12965859
 ]


Alex Boisvert commented on HIVE-1700:
-------------------------------------

Duplicate of HIVE-1702

> Optimiza JDBM to make mapjoin faster
> ------------------------------------
>
>                 Key: HIVE-1700
>                 URL: https://issues.apache.org/jira/browse/HIVE-1700
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> copied from email:
> From: Joydeep Sen Sarma
> Sent: Tuesday, October 12, 2010 11:11 AM
> To: Yongqiang He; Liyin Tang; Namit Jain
> Subject: RE: Optimize jdbm
> seems like we should move all deserialization to hive land. jdbm should just 
> work on byte arrays for both keys and values. (since the output of the 
> serializer used by hive is byte comparable - that seems to suffice)
> ________________________________________
> From: Yongqiang He
> Sent: Tuesday, October 12, 2010 10:22 AM
> To: Liyin Tang; Namit Jain
> Cc: Joydeep Sen Sarma
> Subject: Optimize jdbm
>   1.  Htree.get() cost 70% total time.  It could help a lot if there is bloom 
> filter here to avoid unneeded get() if we know for sure the given key is not 
> in JDBM. (we can generate the bloom filter when doing the jdbm sink, and read 
> into memory when doing read. )
>   2.  HTree.get() will deserialize both key and value until find a matched 
> key. We can only de-serialize the key, and de-serialize the value until  the 
> key match.
> Any others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1700) Optimiza JDBM to make mapjoin faster

Reply via email to