Wei Zheng created HIVE-13755:
--------------------------------

             Summary: Hybrid mapjoin allocates memory the same for multi 
broadcast
                 Key: HIVE-13755
                 URL: https://issues.apache.org/jira/browse/HIVE-13755
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 2.1.0
            Reporter: Wei Zheng
            Assignee: Wei Zheng


PROBLEM:

When hybrid mapjoin gets the memory needed, it estimates memory needed for each 
hashtable the same. This may cause problem when there are multiple broadcast, 
as it may exceeds the memory intended to allocate to it.

Example reducer task log attached.  This task has 5 broadcast input,

Reducer 3 <- Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
(BROADCAST_EDGE), Map 8 (SIMPLE_EDGE), Map 9 (BROADCAST_EDGE), Reducer 2 
(SIMPLE_EDGE)



excerpt of it:

{code}
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory 
manager allocates 0 bytes for the loading hashtable.
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: 
Key count from statistics is 210; setting map size to 280
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Total available memory: 1968177152
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Estimated small table size: 155190
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of hash partitions to be 
created: 16
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Write buffer size: 524288
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of partitions created: 16
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of partitions spilled directly 
to disk on creation: 0
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using 
tableContainer HybridHashTableContainer
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Initializing container with 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: 
Num Records read: 20
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG 
method=LoadHashtable start=1458069830811 end=1458069830814 duration=3 
from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching 
key: 
svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_126_container
2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.HashTableDummyOperator|: 
Initializing operator HASHTABLEDUMMY[32]
2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing 
operator MAPJOIN[26]
2016-03-15 19:23:50,816 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN 
struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string,_col568:char(1),_col570:string>
 totalsz = 95
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG 
method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory 
manager allocates 0 bytes for the loading hashtable.
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: 
Key count from statistics is 5942112; setting map size to 7922816
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Total available memory: 1968177152
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Estimated small table size: 1324101915
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of hash partitions to be 
created: 16
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Write buffer size: 8388608
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of partitions created: 16
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of partitions spilled directly 
to disk on creation: 0
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using 
tableContainer HybridHashTableContainer
2016-03-15 19:23:50,831 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Initializing container with 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:51,543 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: 
Num Records read: 852596
2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |log.PerfLogger|: </PERFLOG 
method=LoadHashtable start=1458069830817 end=1458069831563 duration=746 
from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:51,563 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching 
key: 
svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_127_container
2016-03-15 19:23:51,563 [INFO] [TezChild] |exec.HashTableDummyOperator|: 
Initializing operator HASHTABLEDUMMY[31]
2016-03-15 19:23:51,564 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing 
operator MAPJOIN[27]
2016-03-15 19:23:51,566 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN 
struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string>
 totalsz = 93
2016-03-15 19:23:51,566 [INFO] [pool-47-thread-1] |log.PerfLogger|: <PERFLOG 
method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator>
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory 
manager allocates 0 bytes for the loading hashtable.
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: 
Key count from statistics is 293380; setting map size to 391174
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Total available memory: 1968177152
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Estimated small table size: 69929471
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of hash partitions to be 
created: 16
2016-03-15 19:23:51,567 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Write buffer size: 4194304
2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of partitions created: 16
2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of partitions spilled directly 
to disk on creation: 0
2016-03-15 19:23:51,568 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using 
tableContainer HybridHashTableContainer
2016-03-15 19:23:51,569 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Initializing container with 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:51,980 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: 
Num Records read: 586760


{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to