[jira] [Commented] (HIVE-7617) optimize bytes mapjoin hash table read path wrt serialization, at least for common cases

Sergey Shelukhin (JIRA) Tue, 12 Aug 2014 19:27:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095057#comment-14095057
 ]


Sergey Shelukhin commented on HIVE-7617:
----------------------------------------

I've done some unscientific perf testing, and some profiling. It seems like 
join is much smaller part of the cost than the initial profiling would suggest, 
but it's still not insignificant.
Running the same simple query with mapjoin on TCPDS 200 scale dataset, on the 
same cluster, with container reuse for each separate configuration, without 
optimized hashtable 18 times takes 9.63sec. on average, 8.39 is the average of 
last 12 runs. Optimized (for size ;)) hashtable with no int reader (but with 
other small perf stuff from this patch) takes 10.35/8.82, with this patch it 
takes 9.37/8.48.
I can see in profiler dumps (separate from the above runs) that cost of key 
serialization is gone from the query.

> optimize bytes mapjoin hash table read path wrt serialization, at least for 
> common cases
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-7617
>                 URL: https://issues.apache.org/jira/browse/HIVE-7617
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-7617.01.patch, HIVE-7617.patch, 
> HIVE-7617.prelim.patch
>
>
> BytesBytes has table stores keys in the byte array for compact 
> representation, however that means that the straightforward implementation of 
> lookups serializes lookup keys to byte arrays, which is relatively expensive.
> We can either shortcut hashcode and compare for common types on read path 
> (integral types which would cover most of the real-world keys), or specialize 
> hashtable and from BytesBytes... create LongBytes, StringBytes, or whatever. 
> First one seems simpler now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7617) optimize bytes mapjoin hash table read path wrt serialization, at least for common cases

Reply via email to