Hello,

(I just asked this same question on stackoverflow. Not sure if I can get
any answer here or there faster. If I do I will update the other side).

I am practicing hdfs, hive and spark. I installed Hadoop 3.3.6, Hive 3.1.3
and Spark 3.4.2, but am unable to perform any SQL in pyspark shell. The
error I am getting is:

org.apache.thrift.TApplicationException: Invalid method name: 'get_database'

>From Spark 3.4.2 documentation, it by default uses Hive 2.3.9 for metastore
version, but it can be configured to 3.x.x. So I specified "--conf
'spark.sql.hive.metastore.version=3.1.3' --conf
'spark.sql.hive.metastore.jars=maven'" to my pyspark shell starting script,
but it just gives the get_database problem.

If I don't specify the metastore configuration parameters (i.e. just run
with "pyspark --conf 'spark.sql.catalogImplementation=hive' --conf
'hive.metastore.uris=thrift://master1:10000'"), Spark creates a metastore
in my current local directory (it doesn't even try to connect to my Hive
server). I am wonder why. I do also have hive-site.xml placed in my
$SPARK_HOME/conf directory, and the following property in it:

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://master1:10000</value>
</property>

By github source code search, it seems to me get_database isn't there in
Hive 3.x. I also downloaded and tried Hive 4.0 Beta version, which seems to
have get_database, but the problem persists. I would not want to downgrade
my Hive version to 2.x to cause compatibility issue with my Hadoop 3.x.

Thanks, James Hsieh

Reply via email to