[jira] [Commented] (SPARK-18879) Spark SQL support for Hive hooks regressed

Atul Payapilly (JIRA) Thu, 15 Dec 2016 09:27:17 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751944#comment-15751944
 ]


Atul Payapilly commented on SPARK-18879:
----------------------------------------

Sorry about the lack of details, let me clarify. The version of Hive 
libraries(1.2.1) that the current version of Spark(2.0.x) is using does support 
execution hooks.

It is the extent to which spark utilizes these libraries that has changed. It 
does not use the code path (org.apache.hadoop.hive.ql.Driver.run) anymore, 
which results in Hive hooks not working.

What can be done to add support back? Should we also analyze if any other gaps 
were introduced after moving away from using the Hive Driver for sql execution?



> Spark SQL support for Hive hooks regressed
> ------------------------------------------
>
>                 Key: SPARK-18879
>                 URL: https://issues.apache.org/jira/browse/SPARK-18879
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.2
>            Reporter: Atul Payapilly
>
> As per the stack trace from this post: 
> http://ihorbobak.com/index.php/2015/05/08/113/
> run on Spark 1.3.1
> hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
> FAILED: Hive Internal Error: 
> java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
>     at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
>     at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>     at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
>     at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
>     at 
> org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
>     at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
>     at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>     at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>     at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
>     at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
>     at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
> It looks like Spark used to rely on the Hive Driver for execution and 
> supported hive hooks. The current code path does not rely on the Hive Driver 
> and support for Hive hooks regressed. This is problematic, for example, there 
> is no way to tell which partitions were updated as part of a query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18879) Spark SQL support for Hive hooks regressed

Reply via email to