Hello, (I just asked this same question on stackoverflow. Not sure if I can get any answer here or there faster. If I do I will update the other side).
I am practicing hdfs, hive and spark. I installed Hadoop 3.3.6, Hive 3.1.3 and Spark 3.4.2, but am unable to perform any SQL in pyspark shell. The error I am getting is: org.apache.thrift.TApplicationException: Invalid method name: 'get_database' >From Spark 3.4.2 documentation, it by default uses Hive 2.3.9 for metastore version, but it can be configured to 3.x.x. So I specified "--conf 'spark.sql.hive.metastore.version=3.1.3' --conf 'spark.sql.hive.metastore.jars=maven'" to my pyspark shell starting script, but it just gives the get_database problem. If I don't specify the metastore configuration parameters (i.e. just run with "pyspark --conf 'spark.sql.catalogImplementation=hive' --conf 'hive.metastore.uris=thrift://master1:10000'"), Spark creates a metastore in my current local directory (it doesn't even try to connect to my Hive server). I am wonder why. I do also have hive-site.xml placed in my $SPARK_HOME/conf directory, and the following property in it: <property> <name>hive.metastore.uris</name> <value>thrift://master1:10000</value> </property> By github source code search, it seems to me get_database isn't there in Hive 3.x. I also downloaded and tried Hive 4.0 Beta version, which seems to have get_database, but the problem persists. I would not want to downgrade my Hive version to 2.x to cause compatibility issue with my Hadoop 3.x. Thanks, James Hsieh