Bobby Wang created SPARK-51320:
----------------------------------
Summary: Failed to run spark ml on connect with
pyspark-connect==4.0.0.dev2 installation
Key: SPARK-51320
URL: https://issues.apache.org/jira/browse/SPARK-51320
Project: Spark
Issue Type: Bug
Components: Connect, ML, PySpark
Affects Versions: 4.0.0, 4.1
Reporter: Bobby Wang
After deploying spark connect server by
{code:java}
$SPARK_HOME/sbin/start-connect-server.sh \
--master local[*] \
--jars $SPARK_HOME/jars/spark-connect_2.13-4.1.0-SNAPSHOT.jar{code}
{color:#172b4d}I just installed pyspark-connect package instead of full spark
by{color}
{code:java}
pip install pyspark-connect==4.0.0.dev2{code}
Then I ran the below code
{code:java}
from pyspark.ml.classification import (LogisticRegression,
LogisticRegressionModel)
from pyspark.ml.linalg import Vectors
from pyspark.sql import SparkSession
spark = (SparkSession.builder.remote("sc://localhost")
.getOrCreate())
df = spark.createDataFrame([
(Vectors.dense([1.0, 2.0]), 1),
(Vectors.dense([2.0, -1.0]), 1),
(Vectors.dense([-3.0, -2.0]), 0),
(Vectors.dense([-1.0, -2.0]), 0),
], schema=['features', 'label'])
lr = LogisticRegression(maxIter=19, tol=0.0023)
model = lr.fit(df)
print(f"======== model.intercept: {model.intercept}")
print(f"======== model.coefficients: {model.coefficients}")
model.transform(df).show()
{code}
It threw below errors
Traceback (most recent call last):
File "run-demo.py", line 16, in <module>
lr = LogisticRegression(maxIter=19, tol=0.0023)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/__init__.py",
line 115, in wrapper
return func(self, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File
"/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/classification.py",
line 1317, in __init__
self._java_obj = self._new_java_obj(
^^^^^^^^^^^^^^^^^^^
File
"/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py",
line 81, in _new_java_obj
from pyspark.core.context import SparkContext
ModuleNotFoundError: No module named 'pyspark.core'
Exception ignored in: <function JavaWrapper.__del__ at 0x7d32d1fcf2e0>
Traceback (most recent call last):
File
"/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py",
line 51, in __del__
from pyspark.core.context import SparkContext
ModuleNotFoundError: No module named 'pyspark.core'
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]