[
https://issues.apache.org/jira/browse/SPARK-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750895#comment-15750895
]
Sean Owen commented on SPARK-18879:
-----------------------------------
I'm not quite clear what you're reporting. Spark uses later versions of Hive
now as compared to Spark 1.3.1. Hive has changed. If Hive doesn't do X
anymore, that's not a Spark issue per se.
> Spark SQL support for Hive hooks regressed
> ------------------------------------------
>
> Key: SPARK-18879
> URL: https://issues.apache.org/jira/browse/SPARK-18879
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0, 2.0.2
> Reporter: Atul Payapilly
>
> As per the stack trace from this post:
> http://ihorbobak.com/index.php/2015/05/08/113/
> run on Spark 1.3.1
> hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
> FAILED: Hive Internal Error:
> java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
> at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
> at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
> at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
> at
> org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
> at
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
> at
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
> at
> org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
> at
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
> at
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> at
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
> at
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
> at
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
> at
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> It looks like Spark used to rely on the Hive Driver for execution and
> supported hive hooks. The current code path does not rely on the Hive Driver
> and support for Hive hooks regressed. This is problematic, for example, there
> is no way to tell which partitions were updated as part of a query.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]