[
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788580#comment-17788580
]
Bobby Wang edited comment on SPARK-46032 at 11/22/23 12:46 AM:
---------------------------------------------------------------
h1. *Standalone Cluster*
*The standalone has only 1 worker, and the worker and master are in the same
machine which has 12 CPU cores, 32 G mem, 1 GPU.*
h2. Master Log
{code:java}
Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp
/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/conf/:/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/jars/*:/etc/hadoop
-Xmx1g org.apache.spark.deploy.master.Master --host 192.168.31.236 --port 7077
--webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
23/11/22 08:37:05 INFO Master: Started daemon with process name: 46828@xxx
23/11/22 08:37:05 INFO SignalUtils: Registering signal handler for TERM
23/11/22 08:37:05 INFO SignalUtils: Registering signal handler for HUP
23/11/22 08:37:05 INFO SignalUtils: Registering signal handler for INT
23/11/22 08:37:05 WARN Utils: Your hostname, spark-bobby resolves to a loopback
address: 127.0.1.1; using 192.168.31.236 instead (on interface wlp82s0)
23/11/22 08:37:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another
address
23/11/22 08:37:05 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
23/11/22 08:37:05 INFO SecurityManager: Changing view acls to: xxx
23/11/22 08:37:05 INFO SecurityManager: Changing modify acls to: xxx
23/11/22 08:37:05 INFO SecurityManager: Changing view acls groups to:
23/11/22 08:37:05 INFO SecurityManager: Changing modify acls groups to:
23/11/22 08:37:05 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: xxx; groups with view
permissions: EMPTY; users with modify permissions: xxx; groups with modify
permissions: EMPTY
23/11/22 08:37:05 INFO Utils: Successfully started service 'sparkMaster' on
port 7077.
23/11/22 08:37:05 INFO Master: Starting Spark master at
spark://192.168.31.236:7077
23/11/22 08:37:05 INFO Master: Running Spark version 3.5.0
23/11/22 08:37:05 INFO JettyUtils: Start Jetty 0.0.0.0:8080 for MasterUI
23/11/22 08:37:05 INFO Utils: Successfully started service 'MasterUI' on port
8080.
23/11/22 08:37:05 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started
at http://192.168.31.236:8080
23/11/22 08:37:05 INFO Master: I have been elected leader! New state: ALIVE
23/11/22 08:37:09 INFO Master: Registering worker 192.168.31.236:44911 with 12
cores, 30.0 GiB RAM {code}
h2. Worker Log
{code:java}
Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp
/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/conf/:/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/jars/*:/etc/hadoop
-Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081
spark://192.168.31.236:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
23/11/22 08:37:08 INFO Worker: Started daemon with process name:
46981@spark-bobby
23/11/22 08:37:08 INFO SignalUtils: Registering signal handler for TERM
23/11/22 08:37:08 INFO SignalUtils: Registering signal handler for HUP
23/11/22 08:37:08 INFO SignalUtils: Registering signal handler for INT
23/11/22 08:37:08 WARN Utils: Your hostname, spark-bobby resolves to a loopback
address: 127.0.1.1; using 192.168.31.236 instead (on interface wlp82s0)
23/11/22 08:37:08 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another
address
23/11/22 08:37:08 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
23/11/22 08:37:08 INFO SecurityManager: Changing view acls to: xxx
23/11/22 08:37:08 INFO SecurityManager: Changing modify acls to: xxx
23/11/22 08:37:08 INFO SecurityManager: Changing view acls groups to:
23/11/22 08:37:08 INFO SecurityManager: Changing modify acls groups to:
23/11/22 08:37:08 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: xxx; groups with view
permissions: EMPTY; users with modify permissions: xxx; groups with modify
permissions: EMPTY
23/11/22 08:37:08 INFO Utils: Successfully started service 'sparkWorker' on
port 44911.
23/11/22 08:37:08 INFO Worker: Worker decommissioning not enabled.
23/11/22 08:37:08 INFO Worker: Starting Spark worker 192.168.31.236:44911 with
12 cores, 30.0 GiB RAM
23/11/22 08:37:08 INFO Worker: Running Spark version 3.5.0
23/11/22 08:37:08 INFO Worker: Spark home:
/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3
23/11/22 08:37:08 INFO ResourceDiscoveryScriptPlugin: Discovering resources for
gpu with script: /home/xxx/github/mytools/nvidia/script/getGpusResources.sh
23/11/22 08:37:08 INFO ResourceUtils:
==============================================================
23/11/22 08:37:08 INFO ResourceUtils: Custom resources for spark.worker:
gpu -> [name: gpu, addresses: 0]
23/11/22 08:37:08 INFO ResourceUtils:
==============================================================
23/11/22 08:37:08 INFO JettyUtils: Start Jetty 0.0.0.0:8081 for WorkerUI
23/11/22 08:37:08 INFO Utils: Successfully started service 'WorkerUI' on port
8081.
23/11/22 08:37:08 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started
at http://192.168.31.236:8081
23/11/22 08:37:08 INFO Worker: Connecting to master 192.168.31.236:7077...
23/11/22 08:37:08 INFO TransportClientFactory: Successfully created connection
to /192.168.31.236:7077 after 17 ms (0 ms spent in bootstraps)
23/11/22 08:37:09 INFO Worker: Successfully registered with master
spark://192.168.31.236:7077
{code}
was (Author: wbo4958):
h1. *Standalone Cluster*
h2. Master Log
{code:java}
Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp
/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/conf/:/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/jars/*:/etc/hadoop
-Xmx1g org.apache.spark.deploy.master.Master --host 192.168.31.236 --port 7077
--webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
23/11/22 08:37:05 INFO Master: Started daemon with process name: 46828@xxx
23/11/22 08:37:05 INFO SignalUtils: Registering signal handler for TERM
23/11/22 08:37:05 INFO SignalUtils: Registering signal handler for HUP
23/11/22 08:37:05 INFO SignalUtils: Registering signal handler for INT
23/11/22 08:37:05 WARN Utils: Your hostname, spark-bobby resolves to a loopback
address: 127.0.1.1; using 192.168.31.236 instead (on interface wlp82s0)
23/11/22 08:37:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another
address
23/11/22 08:37:05 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
23/11/22 08:37:05 INFO SecurityManager: Changing view acls to: xxx
23/11/22 08:37:05 INFO SecurityManager: Changing modify acls to: xxx
23/11/22 08:37:05 INFO SecurityManager: Changing view acls groups to:
23/11/22 08:37:05 INFO SecurityManager: Changing modify acls groups to:
23/11/22 08:37:05 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: xxx; groups with view
permissions: EMPTY; users with modify permissions: xxx; groups with modify
permissions: EMPTY
23/11/22 08:37:05 INFO Utils: Successfully started service 'sparkMaster' on
port 7077.
23/11/22 08:37:05 INFO Master: Starting Spark master at
spark://192.168.31.236:7077
23/11/22 08:37:05 INFO Master: Running Spark version 3.5.0
23/11/22 08:37:05 INFO JettyUtils: Start Jetty 0.0.0.0:8080 for MasterUI
23/11/22 08:37:05 INFO Utils: Successfully started service 'MasterUI' on port
8080.
23/11/22 08:37:05 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started
at http://192.168.31.236:8080
23/11/22 08:37:05 INFO Master: I have been elected leader! New state: ALIVE
23/11/22 08:37:09 INFO Master: Registering worker 192.168.31.236:44911 with 12
cores, 30.0 GiB RAM {code}
h2. Worker Log
{code:java}
Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp
/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/conf/:/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/jars/*:/etc/hadoop
-Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081
spark://192.168.31.236:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
23/11/22 08:37:08 INFO Worker: Started daemon with process name:
46981@spark-bobby
23/11/22 08:37:08 INFO SignalUtils: Registering signal handler for TERM
23/11/22 08:37:08 INFO SignalUtils: Registering signal handler for HUP
23/11/22 08:37:08 INFO SignalUtils: Registering signal handler for INT
23/11/22 08:37:08 WARN Utils: Your hostname, spark-bobby resolves to a loopback
address: 127.0.1.1; using 192.168.31.236 instead (on interface wlp82s0)
23/11/22 08:37:08 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another
address
23/11/22 08:37:08 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
23/11/22 08:37:08 INFO SecurityManager: Changing view acls to: xxx
23/11/22 08:37:08 INFO SecurityManager: Changing modify acls to: xxx
23/11/22 08:37:08 INFO SecurityManager: Changing view acls groups to:
23/11/22 08:37:08 INFO SecurityManager: Changing modify acls groups to:
23/11/22 08:37:08 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: xxx; groups with view
permissions: EMPTY; users with modify permissions: xxx; groups with modify
permissions: EMPTY
23/11/22 08:37:08 INFO Utils: Successfully started service 'sparkWorker' on
port 44911.
23/11/22 08:37:08 INFO Worker: Worker decommissioning not enabled.
23/11/22 08:37:08 INFO Worker: Starting Spark worker 192.168.31.236:44911 with
12 cores, 30.0 GiB RAM
23/11/22 08:37:08 INFO Worker: Running Spark version 3.5.0
23/11/22 08:37:08 INFO Worker: Spark home:
/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3
23/11/22 08:37:08 INFO ResourceDiscoveryScriptPlugin: Discovering resources for
gpu with script: /home/xxx/github/mytools/nvidia/script/getGpusResources.sh
23/11/22 08:37:08 INFO ResourceUtils:
==============================================================
23/11/22 08:37:08 INFO ResourceUtils: Custom resources for spark.worker:
gpu -> [name: gpu, addresses: 0]
23/11/22 08:37:08 INFO ResourceUtils:
==============================================================
23/11/22 08:37:08 INFO JettyUtils: Start Jetty 0.0.0.0:8081 for WorkerUI
23/11/22 08:37:08 INFO Utils: Successfully started service 'WorkerUI' on port
8081.
23/11/22 08:37:08 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started
at http://192.168.31.236:8081
23/11/22 08:37:08 INFO Worker: Connecting to master 192.168.31.236:7077...
23/11/22 08:37:08 INFO TransportClientFactory: Successfully created connection
to /192.168.31.236:7077 after 17 ms (0 ms spent in bootstraps)
23/11/22 08:37:09 INFO Worker: Successfully registered with master
spark://192.168.31.236:7077
{code}
> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field
> org.apache.spark.rdd.MapPartitionsRDD.f
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
> Issue Type: Bug
> Components: Connect
> Affects Versions: 3.5.0
> Reporter: Bobby Wang
> Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a
> Spark Standalone cluster in which both master and the only worker are in the
> same node.
>
> Then I started the connect server by
> {code:java}
> start-connect-server.sh \
> --master spark://10.19.183.93:7077 \
> --packages org.apache.spark:spark-connect_2.12:3.5.0 \
> --conf spark.executor.cores=12 \
> --conf spark.task.cpus=1 \
> --executor-memory 30G \
> --conf spark.executor.resource.gpu.amount=1 \
> --conf spark.task.resource.gpu.amount=0.08 \
> --driver-memory 1G{code}
>
> I can 100% ensure the spark standalone cluster, the connect server and spark
> driver are started observed from the webui.
>
> Finally, I tried to run a very simple spark job
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using
> pyspark, but I got the below error.
>
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
> _____ ___
> _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
> {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_}
> {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_}
> {_}/{_}/_
>
> _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
> _File "<stdin>", line 1, in <module>_
> _File
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
> line 1645, in collect_
> _table, schema = self._session.client.to_table(query)_
> _File
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
> line 858, in to_table_
> _table, schema, _, _, _ = self._execute_and_fetch(req)_
> _File
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
> line 1282, in _execute_and_fetch_
> _for response in self._execute_and_fetch_as_iterator(req):_
> _File
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
> line 1263, in _execute_and_fetch_as_iterator_
> _self._handle_error(error)_
> _File
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
> line 1502, in _handle_error_
> _self._handle_rpc_error(error)_
> _File
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
> line 1538, in _handle_rpc_error_
> _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException:
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot
> assign instance of java.lang.invoke.SerializedLambda to field
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)_
> _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)_
> _at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)_
> _at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)_
> _at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:86)_
> _at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)_
> _at org.apache.spark.scheduler.Task.run(Task.scala:141)_
> _at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)_
> _at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)_
> _at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)_
> _at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)_
> _at org.apache.spark.executor.Executor$TaskRunner..._
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]