[ https://issues.apache.org/jira/browse/HIVE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965859#action_12965859 ]
Alex Boisvert commented on HIVE-1700: ------------------------------------- Duplicate of HIVE-1702 > Optimiza JDBM to make mapjoin faster > ------------------------------------ > > Key: HIVE-1700 > URL: https://issues.apache.org/jira/browse/HIVE-1700 > Project: Hive > Issue Type: Improvement > Reporter: He Yongqiang > > copied from email: > From: Joydeep Sen Sarma > Sent: Tuesday, October 12, 2010 11:11 AM > To: Yongqiang He; Liyin Tang; Namit Jain > Subject: RE: Optimize jdbm > seems like we should move all deserialization to hive land. jdbm should just > work on byte arrays for both keys and values. (since the output of the > serializer used by hive is byte comparable - that seems to suffice) > ________________________________________ > From: Yongqiang He > Sent: Tuesday, October 12, 2010 10:22 AM > To: Liyin Tang; Namit Jain > Cc: Joydeep Sen Sarma > Subject: Optimize jdbm > 1. Htree.get() cost 70% total time. It could help a lot if there is bloom > filter here to avoid unneeded get() if we know for sure the given key is not > in JDBM. (we can generate the bloom filter when doing the jdbm sink, and read > into memory when doing read. ) > 2. HTree.get() will deserialize both key and value until find a matched > key. We can only de-serialize the key, and de-serialize the value until the > key match. > Any others? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.