[ 
https://issues.apache.org/jira/browse/SPARK-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756759#comment-15756759
 ] 

Atul Payapilly commented on SPARK-18879:
----------------------------------------

That's a fair point, the feature was supported incidentally as a side effect of 
using the HiveDriver initially, so its technically not a regression of a 
supported feature.

To add some context, the reason I'm looking into this is that some of the 
functionality of the Hive hooks are quite useful. Maybe I should close this and 
open a feature request? The functionality I think would be most important is 
being able to tell at the end of a query which partitions were updated, 
specifically in a dynamic partition insert, since this can't be derived by 
analyzing the query alone.

What are your thoughts on this?

> Spark SQL support for Hive hooks regressed
> ------------------------------------------
>
>                 Key: SPARK-18879
>                 URL: https://issues.apache.org/jira/browse/SPARK-18879
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.2
>            Reporter: Atul Payapilly
>
> As per the stack trace from this post: 
> http://ihorbobak.com/index.php/2015/05/08/113/
> run on Spark 1.3.1
> hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
> FAILED: Hive Internal Error: 
> java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
>     at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
>     at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>     at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
>     at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
>     at 
> org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
>     at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
>     at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>     at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>     at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
>     at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
>     at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
> It looks like Spark used to rely on the Hive Driver for execution and 
> supported hive hooks. The current code path does not rely on the Hive Driver 
> and support for Hive hooks regressed. This is problematic, for example, there 
> is no way to tell which partitions were updated as part of a query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to