[jira] [Commented] (SPARK-18879) Spark SQL support for Hive hooks regressed

Sean Owen (JIRA) Thu, 15 Dec 2016 01:27:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750895#comment-15750895
 ]


Sean Owen commented on SPARK-18879:
-----------------------------------

I'm not quite clear what you're reporting. Spark uses later versions of Hive 
now as compared to Spark 1.3.1. Hive has changed. If Hive doesn't  do X 
anymore, that's not a Spark issue per se.

> Spark SQL support for Hive hooks regressed
> ------------------------------------------
>
>                 Key: SPARK-18879
>                 URL: https://issues.apache.org/jira/browse/SPARK-18879
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.2
>            Reporter: Atul Payapilly
>
> As per the stack trace from this post: 
> http://ihorbobak.com/index.php/2015/05/08/113/
> run on Spark 1.3.1
> hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
> FAILED: Hive Internal Error: 
> java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
>     at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
>     at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>     at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
>     at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
>     at 
> org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
>     at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
>     at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>     at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>     at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
>     at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
>     at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
> It looks like Spark used to rely on the Hive Driver for execution and 
> supported hive hooks. The current code path does not rely on the Hive Driver 
> and support for Hive hooks regressed. This is problematic, for example, there 
> is no way to tell which partitions were updated as part of a query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18879) Spark SQL support for Hive hooks regressed

Reply via email to