Atul Payapilly created SPARK-18879:
--------------------------------------

             Summary: Spark SQL support for Hive hooks regressed
                 Key: SPARK-18879
                 URL: https://issues.apache.org/jira/browse/SPARK-18879
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.2, 2.0.0
            Reporter: Atul Payapilly


As per the stack trace from this post: 
http://ihorbobak.com/index.php/2015/05/08/113/
run on Spark 1.3.1

hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook

FAILED: Hive Internal Error: 
java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)

java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook

    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

    at java.lang.Class.forName0(Native Method)

    at java.lang.Class.forName(Class.java:270)

    at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)

    at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)

    at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)

    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)

    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)

    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)

    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)

    at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)

    at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)

    at 
org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)

    at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)

    at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)

    at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)

    at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)

    at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)

    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)

    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)

    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)

    at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)

    at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)

    at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)

    at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)

    at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)

    at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)

    at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)

    at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)

    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

    at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)

    at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)

    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:744)

It looks like Spark used to rely on the Hive Driver for execution and supported 
hive hooks. The current code path does not rely on the Hive Driver and support 
for Hive hooks regressed. This is problematic, for example, there is no way to 
tell which partitions were updated as part of a query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to