[ 
https://issues.apache.org/jira/browse/HIVE-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115656#comment-15115656
 ] 

Chao Sun commented on HIVE-12629:
---------------------------------

This query actually executes successfully in Hive 2.1.0-SNAPSHOT. This is 
because the check for parent/children connection in SparkPlan is disabled in 
another commit. However, the duplicated MapWork still present in the query 
plan, which might cause problem in some other situations. This patch fixes it.

> hive.auto.convert.join=true makes lateral view join sql failed on spark 
> engine on yarn
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-12629
>                 URL: https://issues.apache.org/jira/browse/HIVE-12629
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.2.1
>            Reporter: 吴子美
>            Assignee: Chao Sun
>         Attachments: HIVE-12629.1-spark.patch
>
>
> I am using hive1.2 on spark on yarn. 
> I found 
> select count(1) from 
> (select  user_id from xxx group by user_id ) a join
> (select  user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b
> on a.user_id=b.user_id ;
> failed in hive on spark on yarn, but OK in hive on MR.
> I tried the following sql on spark. It was OK.
> select count(1) from 
> (select  user_id from xxx group by user_id ) a left join
> (select  user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b
> on a.user_id=b.user_id ;
> When I turn hive.auto.convert.join from true to false. Everything goes OK.
> The error message in hive.log was :
> {code}
> 2015-12-09 21:10:17,190 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: 
> <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
> 2015-12-09 21:10:17,190 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO exec.Utilities: 
> Serializing ReduceWork via kryo
> 2015-12-09 21:10:17,214 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: 
> </PERFLOG method=serializePlan start=1449666617190 end=1449666617214 
> duration=24 from=org.apache.hadoop.hive.ql.exec.Utilities>
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO client.RemoteDriver: 
> Failed to run job 8fed1ca8-834f-497f-b189-eab343440a9f
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - java.lang.IllegalStateException: Connection 
> already exists
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlan.connect(SparkPlan.java:142)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:142)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:106)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:252)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -      at 
> java.lang.Thread.run(Thread.java:745)
> 2015-12-09 21:10:17,266 INFO  [RPC-Handler-3]: client.SparkClientImpl 
> (SparkClientImpl.java:handle(522)) - Received result for 
> 8fed1ca8-834f-497f-b189-eab343440a9f
> 2015-12-09 21:10:18,054 ERROR [HiveServer2-Background-Pool: Thread-43]: 
> status.SparkJobMonitor (SessionState.java:printError(960)) - Status: Failed
> 2015-12-09 21:10:18,055 INFO  [HiveServer2-Background-Pool: Thread-43]: 
> log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG 
> method=SparkRunJob start=1449666615051 end=1449666618055 duration=3004 
> from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
> 2015-12-09 21:10:18,076 ERROR [HiveServer2-Background-Pool: Thread-43]: 
> ql.Driver (SessionState.java:printError(960)) - FAILED: Execution Error, 
> return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {code}
> Is it the bug of hive on spark?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to