Wei Zheng created HIVE-13755: -------------------------------- Summary: Hybrid mapjoin allocates memory the same for multi broadcast Key: HIVE-13755 URL: https://issues.apache.org/jira/browse/HIVE-13755 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.1.0 Reporter: Wei Zheng Assignee: Wei Zheng
PROBLEM: When hybrid mapjoin gets the memory needed, it estimates memory needed for each hashtable the same. This may cause problem when there are multiple broadcast, as it may exceeds the memory intended to allocate to it. Example reducer task log attached. This task has 5 broadcast input, Reducer 3 <- Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Map 8 (SIMPLE_EDGE), Map 9 (BROADCAST_EDGE), Reducer 2 (SIMPLE_EDGE) excerpt of it: {code} 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory manager allocates 0 bytes for the loading hashtable. 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: Key count from statistics is 210; setting map size to 280 2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Total available memory: 1968177152 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Estimated small table size: 155190 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of hash partitions to be created: 16 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Write buffer size: 524288 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of partitions created: 16 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of partitions spilled directly to disk on creation: 0 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using tableContainer HybridHashTableContainer 2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Initializing container with org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: Num Records read: 20 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG method=LoadHashtable start=1458069830811 end=1458069830814 duration=3 from=org.apache.hadoop.hive.ql.exec.MapJoinOperator> 2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching key: svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_126_container 2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.HashTableDummyOperator|: Initializing operator HASHTABLEDUMMY[32] 2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing operator MAPJOIN[26] 2016-03-15 19:23:50,816 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string,_col568:char(1),_col570:string> totalsz = 95 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator> 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory manager allocates 0 bytes for the loading hashtable. 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: Key count from statistics is 5942112; setting map size to 7922816 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Total available memory: 1968177152 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Estimated small table size: 1324101915 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of hash partitions to be created: 16 2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Write buffer size: 8388608 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of partitions created: 16 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of partitions spilled directly to disk on creation: 0 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using tableContainer HybridHashTableContainer 2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Initializing container with org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 2016-03-15 19:23:51,543 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: Num Records read: 852596 2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG method=LoadHashtable start=1458069830817 end=1458069831563 duration=746 from=org.apache.hadoop.hive.ql.exec.MapJoinOperator> 2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching key: svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_127_container 2016-03-15 19:23:51,563 [INFO] [TezChild] |exec.HashTableDummyOperator|: Initializing operator HASHTABLEDUMMY[31] 2016-03-15 19:23:51,564 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing operator MAPJOIN[27] 2016-03-15 19:23:51,566 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string> totalsz = 93 2016-03-15 19:23:51,566 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator> 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory manager allocates 0 bytes for the loading hashtable. 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: Key count from statistics is 293380; setting map size to 391174 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Total available memory: 1968177152 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Estimated small table size: 69929471 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of hash partitions to be created: 16 2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Write buffer size: 4194304 2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of partitions created: 16 2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Number of partitions spilled directly to disk on creation: 0 2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using tableContainer HybridHashTableContainer 2016-03-15 19:23:51,569 [INFO] [pool-47-thread-1] |persistence.HybridHashTableContainer|: Initializing container with org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 2016-03-15 19:23:51,980 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: Num Records read: 586760 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)