Atul Payapilly created SPARK-18879:
--------------------------------------
Summary: Spark SQL support for Hive hooks regressed
Key: SPARK-18879
URL: https://issues.apache.org/jira/browse/SPARK-18879
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.2, 2.0.0
Reporter: Atul Payapilly
As per the stack trace from this post:
http://ihorbobak.com/index.php/2015/05/08/113/
run on Spark 1.3.1
hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
FAILED: Hive Internal Error:
java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
at
org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
at
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
at
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
at
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
It looks like Spark used to rely on the Hive Driver for execution and supported
hive hooks. The current code path does not rely on the Hive Driver and support
for Hive hooks regressed. This is problematic, for example, there is no way to
tell which partitions were updated as part of a query.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]