[
https://issues.apache.org/jira/browse/SPARK-51320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930906#comment-17930906
]
Bobby Wang commented on SPARK-51320:
------------------------------------
Close this task due to pyspark-connect==4.0.0.dev2 is quite old that doesn't
include the latest spark connect ml features. I tried to install pyspark-client
compiled locally, it worked very well.
> Failed to run spark ml on connect with pyspark-connect==4.0.0.dev2
> installation
> -------------------------------------------------------------------------------
>
> Key: SPARK-51320
> URL: https://issues.apache.org/jira/browse/SPARK-51320
> Project: Spark
> Issue Type: Bug
> Components: Connect, ML, PySpark
> Affects Versions: 4.0.0, 4.1
> Reporter: Bobby Wang
> Priority: Major
>
> After deploying spark connect server by
>
> {code:java}
> $SPARK_HOME/sbin/start-connect-server.sh \
> --master local[*] \
> --jars $SPARK_HOME/jars/spark-connect_2.13-4.1.0-SNAPSHOT.jar{code}
>
> {color:#172b4d}I just installed pyspark-connect package instead of full spark
> by{color}
>
> {code:java}
> pip install pyspark-connect==4.0.0.dev2{code}
>
>
> Then I ran the below code
>
>
> {code:java}
> from pyspark.ml.classification import (LogisticRegression,
> LogisticRegressionModel)
> from pyspark.ml.linalg import Vectors
> from pyspark.sql import SparkSession
> spark = (SparkSession.builder.remote("sc://localhost")
> .getOrCreate())
> df = spark.createDataFrame([
> (Vectors.dense([1.0, 2.0]), 1),
> (Vectors.dense([2.0, -1.0]), 1),
> (Vectors.dense([-3.0, -2.0]), 0),
> (Vectors.dense([-1.0, -2.0]), 0),
> ], schema=['features', 'label'])
> lr = LogisticRegression(maxIter=19, tol=0.0023)
> model = lr.fit(df)
> print(f"======== model.intercept: {model.intercept}")
> print(f"======== model.coefficients: {model.coefficients}")
> model.transform(df).show()
> {code}
>
> It threw below errors
>
> Traceback (most recent call last):
> File "run-demo.py", line 16, in <module>
> lr = LogisticRegression(maxIter=19, tol=0.0023)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File
> "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/__init__.py",
> line 115, in wrapper
> return func(self, **kwargs)
> ^^^^^^^^^^^^^^^^^^^^
> File
> "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/classification.py",
> line 1317, in __init__
> self._java_obj = self._new_java_obj(
> ^^^^^^^^^^^^^^^^^^^
> File
> "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py",
> line 81, in _new_java_obj
> from pyspark.core.context import SparkContext
> ModuleNotFoundError: No module named 'pyspark.core'
> Exception ignored in: <function JavaWrapper.__del__ at 0x7d32d1fcf2e0>
> Traceback (most recent call last):
> File
> "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py",
> line 51, in __del__
> from pyspark.core.context import SparkContext
> ModuleNotFoundError: No module named 'pyspark.core'
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]