GitHub user jankogasic created a discussion: Connecting PySpark with Hive tables

Hello, I am trying to use pyspark to access kyuubi and Spark. The issue I have 
is with Hive dialect. The queries that end up on cluster have some weird syntax 
and they fail because of that.

STEPS TO REPRODUCE:
- sudo apt install -y openjdk-17-jdk
- export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
- export PATH="$JAVA_HOME/bin:$PATH"
- pip install pyspark 'pyspark[sql]' 'pyspark[pandas_on_spark]' 
- sudo apt install -y krb5-user
- sudo cp krb5.conf /etc/
- kinit -kt janko-gasic.keytab [email protected]

```
import pyspark
print(pyspark.__file__)
print(pyspark.__version__)

from pyspark.sql import SparkSession

DRIVER_JAR  = "./kyuubi-hive-jdbc-shaded-1.10.2.jar"
DIALECT_JAR = "./kyuubi-extension-spark-jdbc-dialect_2.12-1.10.2.jar"

spark = (
    SparkSession.builder
    .appName("KyuubiJDBC")
    .config("spark.jars", f"{DRIVER_JAR},{DIALECT_JAR}")
    # make sure the driver also sees them
    .config("spark.driver.extraClassPath", f"{DRIVER_JAR}:{DIALECT_JAR}")
    # register the dialect (this is what enables ARRAY/MAP/STRUCT over JDBC)
    .config("spark.sql.extensions", 
"org.apache.spark.sql.dialect.KyuubiSparkJdbcDialectExtension")
    .getOrCreate()
)

jdbc_url = (
    "jdbc:kyuubi://spark1.lan.bla1.us:2181,spark2.lan.bla1.us:2181,"
    ";serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi"
    ";auth=KERBEROS;principal=kyuubi/[email protected];ssl=true"
)

try:
    df = (
        spark.read.format("jdbc")
        .option("driver", "org.apache.kyuubi.jdbc.KyuubiHiveDriver")
        .option("url", jdbc_url)
        .option("query", "select 1") # 
<--------------------------------------------------- simple query
        .load()
    )
except Exception as e:
    print(e)
finally:
    spark.stop()
    
df.printSchema()
df.show(5)
```

My error is
```
Py4JJavaError: An error occurred while calling o123.showString.
: java.lang.NullPointerException: Cannot invoke 
"org.apache.spark.SparkEnv.rpcEnv()" because the return value of 
"org.apache.spark.SparkEnv$.get()" is null
```

pyspark 4.0.1 (aslo tried 3.5 but no luck)
kyuubi 1.10.2

GitHub link: https://github.com/apache/kyuubi/discussions/7240

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to