Bobby Wang created SPARK-46032:
----------------------------------

             Summary: connect: cannot assign instance of 
java.lang.invoke.SerializedLambda to field 
org.apache.spark.rdd.MapPartitionsRDD.f
                 Key: SPARK-46032
                 URL: https://issues.apache.org/jira/browse/SPARK-46032
             Project: Spark
          Issue Type: Bug
          Components: Connect
    Affects Versions: 3.5.0
            Reporter: Bobby Wang


I downloaded spark 3.5 from the spark official website, and then I started a 
Spark Standalone cluster in which both master and the only worker are in the 
same node. 

 

Then I started the connect server by 

_start-connect-server.sh \_
    _--master spark://xxxxx:7077 \_
    _--packages org.apache.spark:spark-connect_2.12:3.5.0 \_
    _--conf spark.executor.cores=12 \_
    _--conf spark.task.cpus=1 \_
    _--executor-memory 30G \_
    _--conf spark.executor.resource.gpu.amount=1 \_
    _--conf spark.task.resource.gpu.amount=0.08 \_
    _--driver-memory 1G \_

 

I can 100% ensure the spark standalone cluster, the connect server and spark 
driver are started from the webui.

 

Finally, I tried to run a very simple spark job 
(spark.range(100).filter("id>2").collect()) from spark-connect-client using 
pyspark, but I got the below error.

 
_pyspark --remote sc://localhost_
_Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
_Type "help", "copyright", "credits" or "license" for more information._
_Welcome to_
      _____              ___
     _/ __/__  ___ _____/ /___
    __\ \/ _ \/ _ `/ __/  '_/_
   _/__ / .__/\_,_/_/ /_/\_\   version 3.5.0_
      _/_/_
 
_Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
_Client connected to the Spark Connect server at localhost_
_SparkSession available as 'spark'._
_>>> spark.range(100).filter("id > 3").collect()_
_Traceback (most recent call last):_
  _File "<stdin>", line 1, in <module>_
  _File 
"/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
 line 1645, in collect_
    _table, schema = self._session.client.to_table(query)_
  _File 
"/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
 line 858, in to_table_
    _table, schema, _, _, _ = self._execute_and_fetch(req)_
  _File 
"/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
 line 1282, in _execute_and_fetch_
    _for response in self._execute_and_fetch_as_iterator(req):_
  _File 
"/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
 line 1263, in _execute_and_fetch_as_iterator_
    _self._handle_error(error)_
  _File 
"/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
 line 1502, in _handle_error_
    _self._handle_rpc_error(error)_
  _File 
"/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
 line 1538, in _handle_rpc_error_
    _raise convert_exception(info, status.message) from None_
_pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot assign 
instance of java.lang.invoke.SerializedLambda to field 
org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of 
org.apache.spark.rdd.MapPartitionsRDD_
_at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
_at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
_at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
_at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
_at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
_at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
_at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
_at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
_at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
_at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
_at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)_
_at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)_
_at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)_
_at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)_
_at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:86)_
_at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)_
_at org.apache.spark.scheduler.Task.run(Task.scala:141)_
_at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)_
_at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)_
_at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)_
_at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)_
_at org.apache.spark.executor.Executor$TaskRunner..._
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to