[jira] [Updated] (SPARK-34971) The view with udf created by hive1.x cannot be read by spark

shezm (Jira) Tue, 06 Apr 2021 08:12:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-34971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


shezm updated SPARK-34971:
--------------------------
    Description: 
First , use the command to register a function in hive:
{code:java}
create function shezm.hello as 'test.Hello' using jar 
'hdfs:///udf_test/udf_test.jar'
{code}
 

 Then create view with the udf in hive1.1 , like  
{code:java}
create view shezm.test_view AS select shezm.hello(name) as v from shezm.test;
{code}
and read it use spark , it will get an error :
{code:java}
Exception in thread "main" org.apache.spark.sql.AnalysisException: Undefined 
function: 'shezm.hello'. This function is neither a registered temporary 
function nor a permanent function registered in the database 'default'.; line 1 
pos 7
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355)
 at 
org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53) 
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1354)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1346)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
......
{code}
 

When I investigated this issue, I found hive1.x will wrap all udf with 
backticks when create view with udf .like this:
{code:java}
hive> use shezm;
OK
Time taken: 0.999 seconds
hive> show create table test_view;
OK
CREATE VIEW `test_view` AS select `shezm.hello`(`test`.`id`) from `shezm`.`test`
Time taken: 1.761 seconds, Fetched: 1 row(s)

{code}
Spark will treat `shezm.hello` as a udf name, and cannot parse out the database 
(hive can).

I read the SqlBase.g4 file, the characters wrapped in backticks will be treated 
as complete strings, which seems to be a feature.

 

So, maybe this problem should be solved in AstBuilder#visitFunctionName()? By 
adding a case?

 

  was:
First , use the command to register a function in hive:

 
{code:java}
create function shezm.hello as 'test.Hello' using jar 
'hdfs:///udf_test/udf_test.jar'
{code}
 Then create view with the udf in hive1.1 , like 

 

 
{code:java}
create view shezm.test_view AS select shezm.hello(name) as v from shezm.test;
{code}
and read it use spark , it will get an error :

 
{code:java}
Exception in thread "main" org.apache.spark.sql.AnalysisException: Undefined 
function: 'shezm.hello'. This function is neither a registered temporary 
function nor a permanent function registered in the database 'default'.; line 1 
pos 7
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355)
 at 
org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53) 
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1354)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1346)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
......
{code}
 

When I investigated this issue, I found hive1.x will wrap all udf with 
backticks when create view with udf .like this:

 
{code:java}
hive> use shezm;
OK
Time taken: 0.999 seconds
hive> show create table test_view;
OK
CREATE VIEW `test_view` AS select `shezm.hello`(`test`.`id`) from `shezm`.`test`
Time taken: 1.761 seconds, Fetched: 1 row(s)

{code}
Spark will treat `shezm.hello` as a udf name, and cannot parse out the database 
(hive can).

I read the SqlBase.g4 file, the characters wrapped in backticks will be treated 
as complete strings, which seems to be a feature.

 

So, maybe this problem should be solved in AstBuilder#visitFunctionName()? By 
adding a case?

 

 

 

 

 

 

 


> The view with udf created by hive1.x cannot be read by spark
> ------------------------------------------------------------
>
>                 Key: SPARK-34971
>                 URL: https://issues.apache.org/jira/browse/SPARK-34971
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>         Environment: hive 1.1.0
> spark 2.4
>            Reporter: shezm
>            Priority: Minor
>
> First , use the command to register a function in hive:
> {code:java}
> create function shezm.hello as 'test.Hello' using jar 
> 'hdfs:///udf_test/udf_test.jar'
> {code}
>  
>  Then create view with the udf in hive1.1 , like  
> {code:java}
> create view shezm.test_view AS select shezm.hello(name) as v from shezm.test;
> {code}
> and read it use spark , it will get an error :
> {code:java}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Undefined 
> function: 'shezm.hello'. This function is neither a registered temporary 
> function nor a permanent function registered in the database 'default'.; line 
> 1 pos 7
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355)
>  at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355)
>  at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$51.apply(Analyzer.scala:1355)
>  at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
>  at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1354)
>  at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1346)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
> ......
> {code}
>  
> When I investigated this issue, I found hive1.x will wrap all udf with 
> backticks when create view with udf .like this:
> {code:java}
> hive> use shezm;
> OK
> Time taken: 0.999 seconds
> hive> show create table test_view;
> OK
> CREATE VIEW `test_view` AS select `shezm.hello`(`test`.`id`) from 
> `shezm`.`test`
> Time taken: 1.761 seconds, Fetched: 1 row(s)
> {code}
> Spark will treat `shezm.hello` as a udf name, and cannot parse out the 
> database (hive can).
> I read the SqlBase.g4 file, the characters wrapped in backticks will be 
> treated as complete strings, which seems to be a feature.
>  
> So, maybe this problem should be solved in AstBuilder#visitFunctionName()? By 
> adding a case?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-34971) The view with udf created by hive1.x cannot be read by spark

Reply via email to