[ 
https://issues.apache.org/jira/browse/SPARK-35321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339906#comment-17339906
 ] 

Chao Sun commented on SPARK-35321:
----------------------------------

[~xkrogen] yes that can help to solve the issue, but users need to specify both 
{{spark.sql.hive.metastore.version}} and {{spark.sql.hive.metastore.jars}}. The 
latter is not so easy to setup: the {{maven}} option usually takes a very long 
time to download all the jars, while the {{path}} option require users to 
download all the relevant Hive jars with the specific version and it's tedious. 

I think this specific issue is worth fixing in Spark itself regardless since it 
doesn't really need to load all the permanent functions when starting up Hive 
client from what I can see. The process could also be pretty expensive if there 
are many UDFs registered in HMS.

> Spark 3.x can't talk to HMS 1.2.x and lower due to get_all_functions Thrift 
> API missing
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-35321
>                 URL: https://issues.apache.org/jira/browse/SPARK-35321
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.2, 3.1.1, 3.2.0
>            Reporter: Chao Sun
>            Priority: Major
>
> https://issues.apache.org/jira/browse/HIVE-10319 introduced a new API 
> {{get_all_functions}} which is only supported in Hive 1.3.0/2.0.0 and up. 
> This is called when creating a new {{Hive}} object:
> {code}
>   private Hive(HiveConf c, boolean doRegisterAllFns) throws HiveException {
>     conf = c;
>     if (doRegisterAllFns) {
>       registerAllFunctionsOnce();
>     }
>   }
> {code}
> {{registerAllFunctionsOnce}} will reload all the permanent functions by 
> calling the {{get_all_functions}} API from the megastore. In Spark, we always 
> pass {{doRegisterAllFns}} as true, and this will cause failure:
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: Invalid method name: 
> 'get_all_functions'
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3897)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
>       ... 96 more
> Caused by: org.apache.thrift.TApplicationException: Invalid method name: 
> 'get_all_functions'
>       at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3845)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3833)
> {code}
> It looks like Spark doesn't really need to call {{registerAllFunctionsOnce}} 
> since it loads the Hive permanent function directly from HMS API. The Hive 
> {{FunctionRegistry}} is only used for loading Hive built-in functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to