Prasanth Jayachandran created HIVE-17220:
--------------------------------------------
Summary: Bloomfilter probing in semijoin reduction is thrashing L1
dcache
Key: HIVE-17220
URL: https://issues.apache.org/jira/browse/HIVE-17220
Project: Hive
Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
[~gopalv] observed perf profiles showing bloomfilter probes as bottleneck for
some of the TPC-DS queries and resulted L1 data cache thrashing.
This is because of the huge bitset in bloom filter that doesn't fit in any
levels of cache, also the hash bits corresponding to a single key map to
different segments of bitset which are spread out. This can result in K-1
memory access (K being number of hash functions) in worst case for every key
that gets probed because of locality miss in L1 cache.
Ran a JMH microbenchmark to verify the same. Following is the JMH perf profile
for bloom filter probing
{code}
Perf stats:
--------------------------------------------------
5101.935637 task-clock (msec) # 0.461 CPUs utilized
346 context-switches # 0.068 K/sec
336 cpu-migrations # 0.066 K/sec
6,207 page-faults # 0.001 M/sec
10,016,486,301 cycles # 1.963 GHz
(26.90%)
5,751,692,176 stalled-cycles-frontend # 57.42% frontend cycles
idle (27.05%)
<not supported> stalled-cycles-backend
14,359,914,397 instructions # 1.43 insns per cycle
# 0.40 stalled cycles per
insn (33.78%)
2,200,632,861 branches # 431.333 M/sec
(33.84%)
1,162,860 branch-misses # 0.05% of all branches
(33.97%)
1,025,992,254 L1-dcache-loads # 201.099 M/sec
(26.56%)
432,663,098 L1-dcache-load-misses # 42.17% of all L1-dcache
hits (14.49%)
331,383,297 LLC-loads # 64.952 M/sec
(14.47%)
203,524 LLC-load-misses # 0.06% of all LL-cache
hits (21.67%)
<not supported> L1-icache-loads
1,633,821 L1-icache-load-misses # 0.320 M/sec
(28.85%)
950,368,796 dTLB-loads # 186.276 M/sec
(28.61%)
246,813,393 dTLB-load-misses # 25.97% of all dTLB cache
hits (14.53%)
25,451 iTLB-loads # 0.005 M/sec
(14.48%)
35,415 iTLB-load-misses # 139.15% of all iTLB cache
hits (21.73%)
<not supported> L1-dcache-prefetches
175,958 L1-dcache-prefetch-misses # 0.034 M/sec
(28.94%)
11.064783140 seconds time elapsed
{code}
This shows 42.17% of L1 data cache misses.
This jira is to use cache efficient bloom filter for semijoin probing.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)