Wei Zheng created HIVE-13755:
--------------------------------
Summary: Hybrid mapjoin allocates memory the same for multi
broadcast
Key: HIVE-13755
URL: https://issues.apache.org/jira/browse/HIVE-13755
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng
PROBLEM:
When hybrid mapjoin gets the memory needed, it estimates memory needed for each
hashtable the same. This may cause problem when there are multiple broadcast,
as it may exceeds the memory intended to allocate to it.
Example reducer task log attached. This task has 5 broadcast input,
Reducer 3 <- Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12
(BROADCAST_EDGE), Map 8 (SIMPLE_EDGE), Map 9 (BROADCAST_EDGE), Reducer 2
(SIMPLE_EDGE)
excerpt of it:
{code}
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory
manager allocates 0 bytes for the loading hashtable.
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|:
Key count from statistics is 210; setting map size to 280
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Total available memory: 1968177152
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Estimated small table size: 155190
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Number of hash partitions to be
created: 16
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Write buffer size: 524288
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Number of partitions created: 16
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Number of partitions spilled directly
to disk on creation: 0
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using
tableContainer HybridHashTableContainer
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Initializing container with
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|:
Num Records read: 20
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG
method=LoadHashtable start=1458069830811 end=1458069830814 duration=3
from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching
key:
svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_126_container
2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.HashTableDummyOperator|:
Initializing operator HASHTABLEDUMMY[32]
2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing
operator MAPJOIN[26]
2016-03-15 19:23:50,816 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN
struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string,_col568:char(1),_col570:string>
totalsz = 95
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG
method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory
manager allocates 0 bytes for the loading hashtable.
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|:
Key count from statistics is 5942112; setting map size to 7922816
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Total available memory: 1968177152
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Estimated small table size: 1324101915
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Number of hash partitions to be
created: 16
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Write buffer size: 8388608
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Number of partitions created: 16
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Number of partitions spilled directly
to disk on creation: 0
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using
tableContainer HybridHashTableContainer
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Initializing container with
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:51,543 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|:
Num Records read: 852596
2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG
method=LoadHashtable start=1458069830817 end=1458069831563 duration=746
from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching
key:
svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_127_container
2016-03-15 19:23:51,563 [INFO] [TezChild] |exec.HashTableDummyOperator|:
Initializing operator HASHTABLEDUMMY[31]
2016-03-15 19:23:51,564 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing
operator MAPJOIN[27]
2016-03-15 19:23:51,566 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN
struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string>
totalsz = 93
2016-03-15 19:23:51,566 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG
method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory
manager allocates 0 bytes for the loading hashtable.
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|:
Key count from statistics is 293380; setting map size to 391174
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Total available memory: 1968177152
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Estimated small table size: 69929471
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Number of hash partitions to be
created: 16
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Write buffer size: 4194304
2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Number of partitions created: 16
2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Number of partitions spilled directly
to disk on creation: 0
2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using
tableContainer HybridHashTableContainer
2016-03-15 19:23:51,569 [INFO] [pool-47-thread-1]
|persistence.HybridHashTableContainer|: Initializing container with
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:51,980 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|:
Num Records read: 586760
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)