[ 
https://issues.apache.org/jira/browse/IMPALA-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374804#comment-17374804
 ] 

Amogh Margoor commented on IMPALA-7635:
---------------------------------------

Review Request: https://gerrit.cloudera.org/#/c/17592/

Benchmark Results: 

Machine Info: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz Note: Benchmark name are 
in format <name>_XX_YY: name represents the name of benchmark 
(probe|build|memory). XX represents the number of rows in the dataset. YY 
represents the percentage of unique values in dataset. Runtime Benchmark

 
|Memory Benchmark| | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
|Benchmark|Bytes Consumed without Changes| |Bytes Consumed with IMPALA-7635| | 
|% reduction| | | | | | | | | | |
|memory_1048576_100|16777216| |12582912| | |25| | | | | | | | | | |
|memory_1048576_60|23488096| |16777212| | |28.57142614| | | | | | | | | | |
|memory_1048576_20|30198976| |20971512| | |30.55555261| | | | | | | | | | |
|memory_4194304_100|67108864| |50331648| | |25| | | | | | | | | | |
|memory_4194304_60|93952416| |67108868| | |28.57142918| | | | | | | | | | |
|memory_4194304_20|120795968| |83886088| | |30.55555629| | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
|Runtime Benchmark| | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| | |Num Iterations without changes| | | |Num Iterations with IMPALA-7635| | | 
|(IMPALA-7635 Changes / Without Changes)|
| | | | | | | | | | | | | | |
| | |10%ile|50%ile|90%ile| | | |10%ile|50%ile|90%ile| | | |10%ile X|50%ile 
X|90%ile X|
|build_65536_100| |2.25|3.74|3.81| | | |4.04|4.35|4.35| | | 
|1.795555556|1.163101604|1.141732283|
| | | | | | | | | | | | | | | | | |
|build_65536_60| |7.31|9.19|9.37| | | |8.27|9.09|9.26| | | 
|1.131326949|0.9891186072|0.9882604055|
| | | | | | | | | | | | | | | | | |
|build_65536_20| |8.57|9.08|9.44| | | |8.78|9|9.3| | | 
|1.024504084|0.9911894273|0.9851694915|
| | | | | | | | | | | | | | | | | |
|build_262144_100| |0.171|0.212|0.305| | | |0.386|0.407|0.407| | | 
|2.257309942|1.919811321|1.33442623|
| | | | | | | | | | | | | | | | | |
|build_262144_60| |0.197|0.209|0.219| | | |0.316|0.407|0.415| | | 
|1.604060914|1.947368421|1.894977169|
| | | | | | | | | | | | | | | | | |
|build_262144_20| |0.212|0.305|0.316| | | |0.295|0.31|0.316| | | 
|1.391509434|1.016393443|1|
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
|probe_65536_100| |2.14|2.18|2.64| | | |2.64|2.64|2.69| | | 
|1.23364486|1.211009174|1.018939394|
| | | | | | | | | | | | | | | | | |
|probe_65536_60| |5.8|5.96|5.96| | | |5.1|5.96|6.08| | | 
|0.8793103448|1|1.020134228|
| | | | | | | | | | | | | | | | | |
|probe_65536_20| |6.67|6.96|6.96| | | |6.2|6.35|6.35| | | 
|0.9295352324|0.9123563218|0.9123563218|
| | | | | | | | | | | | | | | | | |
|probe_262144_100| |0.222|0.226|0.23| | | |0.233|0.237|0.237| | | 
|1.04954955|1.048672566|1.030434783|
| | | | | | | | | | | | | | | | | |
|probe_262144_60| |0.233|0.241|0.246| | | |0.237|0.25|0.25| | | 
|1.017167382|1.037344398|1.016260163|
| | | | | | | | | | | | | | | | | |
|probe_262144_20| |0.246|0.25|0.255| | | |0.255|0.259|0.259| | | 
|1.036585366|1.036|1.015686275|
| | | | | | | | | | | | | | | | | |
|probe_65536_absentkeys| |16.3|16.3|17.6| | | |20.1|21.6|21.6| | | 
|1.233128834|1.325153374|1.227272727|
| | | | | | | | | | | | | | | | | |
|probe_262144_absentkeys| |0.52|0.526|0.526| | | |0.727|0.741|0.741| | | 
|1.398076923|1.408745247|1.408745247|

> Reduce size of hash tables in-memory by packing buckets more densely
> --------------------------------------------------------------------
>
>                 Key: IMPALA-7635
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7635
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Tim Armstrong
>            Assignee: Amogh Margoor
>            Priority: Major
>              Labels: perf
>
> Currently the hash tables used for hash join and aggregation use 16 bytes per 
> bucket and 24 bytes per additional duplicate for a key:
> {code}
>   /// Linked list of entries used for duplicates.
>   struct DuplicateNode {
>     /// Used for full outer and right {outer, anti, semi} joins. Indicates 
> whether the
>     /// row in the DuplicateNode has been matched.
>     /// From an abstraction point of view, this is an awkward place to store 
> this
>     /// information.
>     /// TODO: Fold this flag in the next pointer below.
>     bool matched;
>     /// Chain to next duplicate node, NULL when end of list.
>     DuplicateNode* next;
>     HtData htdata;
>   };
>   struct Bucket {
>     /// Whether this bucket contains a vaild entry, or it is empty.
>     bool filled;
>     /// Used for full outer and right {outer, anti, semi} joins. Indicates 
> whether the
>     /// row in the bucket has been matched.
>     /// From an abstraction point of view, this is an awkward place to store 
> this
>     /// information but it is efficient. This space is otherwise unused.
>     bool matched;
>     /// Used in case of duplicates. If true, then the bucketData union should 
> be used as
>     /// 'duplicates'.
>     bool hasDuplicates;
>     /// Cache of the hash for data.
>     /// TODO: Do we even have to cache the hash value?
>     uint32_t hash;
>     /// Either the data for this bucket or the linked list of duplicates.
>     union {
>       HtData htdata;
>       DuplicateNode* duplicates;
>     } bucketData;
>   };
> {code}
> There are some comments in the code that suggest folding the boolean values 
> into the upper bits of the pointers (since on amd64 the address space is only 
> 48 bits, but moving to 57 bits apparently - see 
> https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf).
>  That would reduce the bucket to 12 bytes of actual data.
> This would give us the opportunity to reduce memory requirements of joins and 
> the pressure on caches significantly, provided we can work out the 
> implementation issues and the cost of the bit manipulation doesn't exceed the 
> benefit (my intuition is that cache effects are way more important but I 
> could be wrong).
> Here's a rough idea of what we could do:
> # Implement folding of booleans into the pointer and mark struct Bucket as 
> packed so that it doesn't just undo the work with additional padding.
> # Modifying Hashtable to work with the new bucket structure. This needs a 
> little thought since the bucket allocations must be a power-of-two size in 
> bytes, but we also need the hash table entries to be a power-of-two in order 
> for masking the hash to get the bucket number to work. I think either we 
> could just leave wasted space in the buffer or switch to a non-power-of-two 
> number of buckets and using an alternative method of getting the bucket from 
> the hash: 
> https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
> # Run benchmarks to see if it's beneficial. The effect probably depends on 
> the data set size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to