Yuming Wang created HIVE-14112: ---------------------------------- Summary: Join a HBase mapped big table shouldn't convert to MapJoin Key: HIVE-14112 URL: https://issues.apache.org/jira/browse/HIVE-14112 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 1.1.0, 1.2.0 Reporter: Yuming Wang Assignee: Yuming Wang Priority: Minor
Two tables, _hbasetable_risk_control_defense_idx_uid_ is HBase mapped table: {noformat} [root@dev01 ~]# hadoop fs -du -s -h /hbase/data/tandem/hbase-table-risk-control-defense-idx-uid 3.0 G 9.0 G /hbase/data/tandem/hbase-table-risk-control-defense-idx-uid [root@dev01 ~]# hadoop fs -du -s -h /user/hive/warehouse/openapi_invoke_base 6.6 G 19.7 G /user/hive/warehouse/openapi_invoke_base {noformat} The smallest table is 3.0G, is greater than _hive.mapjoin.smalltable.filesize_ and _hive.auto.convert.join.noconditionaltask.size_. When join these tables, Hive auto convert it to mapjoin: {noformat} hive> select count(*) from hbasetable_risk_control_defense_idx_uid t1 join openapi_invoke_base t2 on (t1.key=t2.merchantid); Query ID = root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5 Total jobs = 1 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Execution log at: /tmp/root/root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5.log 2016-06-28 09:22:10 Starting to launch local task to process map join; maximum memory = 1908932608 {noformat} the root cause is hive use _/user/hive/warehouse/hbasetable_risk_control_defense_idx_uid_ as it location, but it empty. so hive auto convert it to mapjoin. My opinion is set right location when mapping HBase table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)