[ https://issues.apache.org/jira/browse/HIVE-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050900#comment-15050900 ]
吴子美 commented on HIVE-12629: ---------------------------- I will give you a simple dataset to show the bug. hive> desc logs1; OK iik int created_at string Time taken: 0.145 seconds, Fetched: 2 row(s) hive> select * from logs1; OK 1 js 5 9wj 1 js 5 9wj 56 io 1 js 5 9wj 1 js 5 9wj Time taken: 0.687 seconds, Fetched: 9 row(s) hive> select count(*) from > (select iik from logs1 group by iik) a join > (select iik from logs1 LATERAL VIEW json_tuple(created_at,'ss') v1 as ss) b on a.iik=b.iik; Query ID = root_20151210201756_e56dadee-69c9-4d4e-838d-98173eab25ec Total jobs = 2 Launching Job 1 out of 2 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Spark Job = 031c6e38-d2d3-4b19-baa7-de1553cd7277 Status: Failed FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask After I change hive.auto.convert.join from true to false. The result is OK. > hive.auto.convert.join=true makes lateral view join sql failed on spark > engine on yarn > -------------------------------------------------------------------------------------- > > Key: HIVE-12629 > URL: https://issues.apache.org/jira/browse/HIVE-12629 > Project: Hive > Issue Type: Bug > Components: Spark > Affects Versions: 1.2.1 > Reporter: 吴子美 > Assignee: Xuefu Zhang > > I am using hive1.2 on spark on yarn. > I found > select count(1) from > (select user_id from xxx group by user_id ) a join > (select user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b > on a.user_id=b.user_id ; > failed in hive on spark on yarn, but OK in hive on MR. > I tried the following sql on spark. It was OK. > select count(1) from > (select user_id from xxx group by user_id ) a left join > (select user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b > on a.user_id=b.user_id ; > When I turn hive.auto.convert.join from true to false. Everything goes OK. > The error message in hive.log was : > {code} > 2015-12-09 21:10:17,190 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: > <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities> > 2015-12-09 21:10:17,190 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO exec.Utilities: > Serializing ReduceWork via kryo > 2015-12-09 21:10:17,214 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: > </PERFLOG method=serializePlan start=1449666617190 end=1449666617214 > duration=24 from=org.apache.hadoop.hive.ql.exec.Utilities> > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO client.RemoteDriver: > Failed to run job 8fed1ca8-834f-497f-b189-eab343440a9f > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - java.lang.IllegalStateException: Connection > already exists > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hadoop.hive.ql.exec.spark.SparkPlan.connect(SparkPlan.java:142) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:142) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:106) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:252) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > java.util.concurrent.FutureTask.run(FutureTask.java:262) > 2015-12-09 21:10:17,262 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > 2015-12-09 21:10:17,262 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > 2015-12-09 21:10:17,262 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > java.lang.Thread.run(Thread.java:745) > 2015-12-09 21:10:17,266 INFO [RPC-Handler-3]: client.SparkClientImpl > (SparkClientImpl.java:handle(522)) - Received result for > 8fed1ca8-834f-497f-b189-eab343440a9f > 2015-12-09 21:10:18,054 ERROR [HiveServer2-Background-Pool: Thread-43]: > status.SparkJobMonitor (SessionState.java:printError(960)) - Status: Failed > 2015-12-09 21:10:18,055 INFO [HiveServer2-Background-Pool: Thread-43]: > log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG > method=SparkRunJob start=1449666615051 end=1449666618055 duration=3004 > from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor> > 2015-12-09 21:10:18,076 ERROR [HiveServer2-Background-Pool: Thread-43]: > ql.Driver (SessionState.java:printError(960)) - FAILED: Execution Error, > return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask > {code} > Is it the bug of hive on spark? -- This message was sent by Atlassian JIRA (v6.3.4#6332)