JoneZhang created HIVE-12716: -------------------------------- Summary: Hive on Spark map-join throw NullPointerException Key: HIVE-12716 URL: https://issues.apache.org/jira/browse/HIVE-12716 Project: Hive Issue Type: Bug Affects Versions: 1.2.1, 1.1.1 Reporter: JoneZhang
The query is set hive.execution.engine=spark; select t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num, (case when t4.cnt is null then 0 else 1 end) as is_evil from (select /*+mapjoin(t2)*/ pcid,channel,version,ip,hour, (case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id, t2.name as app_name, app_apk, app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num from t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on (lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds = 20151217) where t1.ds = 20151217) t3 left outer join ( select pcid,count(1) cnt from t_ed_soft_evillog_molo where ds=20151217 group by pcid ) t4 on t3.pcid=t4.pcid; Create table statements are as follows CREATE TABLE `t_ed_soft_downloadlog_molo`( `pcid` string, `channel` string, `version` string, `ip` string, `hour` string, `app_id` bigint, `app_name` string, `app_apk` string, `app_version` string, `app_type` string, `dwl_tool` string, `dwl_status` string, `err_type` string, `dwl_store` string, `dwl_maxspeed` string, `dwl_minspeed` string, `dwl_avgspeed` string, `last_time` date, `dwl_num` int) PARTITIONED BY ( `ds` bigint); CREATE TABLE `t_rd_soft_app_pkg_name`( `app_id` bigint, `cp_id` bigint, `cat_id` bigint, `package_id` string, `name` string) PARTITIONED BY ( `ds` bigint); CREATE TABLE `t_ed_soft_evillog_molo`( `imp_date` string, `uin` string, `pcid` string, `appid` string, `domain` string, `action_type` string, `via` string) PARTITIONED BY ( `ds` bigint); And the error log is 2015-12-18 08:10:18,685 INFO [main]: spark.SparkMapJoinOptimizer (SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to map join 2015-12-18 08:10:18,686 ERROR [main]: ql.Driver (SessionState.java:printError(966)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedParentMapJoinSize(SparkMapJoinOptimizer.java:312) at org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedMapJoinSize(SparkMapJoinOptimizer.java:292) at org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getMapJoinConversionInfo(SparkMapJoinOptimizer.java:271) at org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.process(SparkMapJoinOptimizer.java:80) at org.apache.hadoop.hive.ql.optimizer.spark.SparkJoinOptimizer.process(SparkJoinOptimizer.java:58) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:92) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:97) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:81) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:135) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:112) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:128) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10238) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:210) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:233) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1123) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1171) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1050) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:208) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:160) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:447) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:795) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:767) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:704) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Some properties on hive-site.xml is <property> <name>hive.ignore.mapjoin.hint</name> <value>false</value> </property> <property> <name>hive.auto.convert.join</name> <value>true</value> </property> <property> <name>hive.auto.convert.join.noconditionaltask</name> <value>true</value> </property> The error relevant code is long mjSize = ctx.getMjOpSizes().get(op); I think it should be checked whether or not ctx.getMjOpSizes().get(op) is null. Of course, more strict logic need to you to decide. Thanks. Best Wishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)