test2022123 created SPARK-40638:
-----------------------------------

             Summary: RpcOutboxMessage: Ask terminated before connecting 
successfully
                 Key: SPARK-40638
                 URL: https://issues.apache.org/jira/browse/SPARK-40638
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.3.0
         Environment: mac 12.6

Python 3.8.13

spark-3.3.0-bin-hadoop3

 

docker-compose.yml:
{code:java}
version: '3'services:
  spark-master:
    image: docker.io/bitnami/spark:3.3
    hostname: spark-master
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_LOCAL_HOSTNAME=spark-master  
    ports:
      - '8080:8080'
      - '7077:7077'
    networks:
      - spark-network
      
      
  spark-worker-1:
    image: docker.io/bitnami/spark:3.3
    hostname: spark-worker-1
    depends_on: 
      - spark-master
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_WORKER_MEMORY=4g
      - SPARK_WORKER_CORES=8
      - SPARK_WORKER_PORT=6061
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_LOCAL_HOSTNAME=spark-worker-1
    ports:
      - '14040:4040'
      - '18081:8081'
      - '16061:6061'
    networks:
      - spark-network
      
      
  spark-worker-2:
    image: docker.io/bitnami/spark:3.3
    hostname: spark-worker-2
    depends_on: 
      - spark-worker-1
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_WORKER_MEMORY=4g
      - SPARK_WORKER_CORES=8
      - SPARK_WORKER_PORT=6062
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_LOCAL_HOSTNAME=spark-worker-2
    ports:
      - '24040:4040'
      - '28081:8081'    
      - '26062:6062'
    networks:
      - spark-networknetworks:
    spark-network: {code}
 

 
            Reporter: test2022123


{color:#FF0000}*Pyspark submit job stuck and infinitely retry.*{color}



*pyspark job running with:*
{code:java}
$ PYSPARK_PYTHON=python SPARK_HOME="/Users/mike/Tools/spark-3.3.0-bin-hadoop3" 
pyspark --master spark://spark-master:7077                                      
                                                                                
                    [10:20:25]
Python 3.8.13 (default, Mar 28 2022, 06:16:26)
[Clang 12.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
22/10/03 10:23:32 WARN Utils: Your hostname, codecan.local resolves to a 
loopback address: 127.0.0.1; using 192.168.31.31 instead (on interface en5)
22/10/03 10:23:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
22/10/03 10:23:32 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.0
      /_/Using Python version 3.8.13 (default, Mar 28 2022 06:16:26)
Spark context Web UI available at http://192.168.31.31:4040
Spark context available as 'sc' (master = spark://spark-master:7077, app id = 
app-20221003022333-0000).
SparkSession available as 'spark'.
>>> from pyspark.sql.functions import col
>>> spark.range(0,5).select(col("id").cast("double")).agg({'id': 'sum'}).show()
22/10/03 10:24:24 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
22/10/03 10:24:39 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
22/10/03 10:24:54 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
22/10/03 10:25:09 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
22/10/03 10:25:24 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources {code}
*spark-defaults.conf*
{code:java}
spark.driver.port 13333
spark.executor.memory 512m
spark.executor.cores 1
spark.executor.instances 2
spark.cores.max 1
spark.shuffle.service.enabled false
spark.dynamicAllocation.enabled false {code}
 

 

 

 
h1. 
*stderr log page for app-20221003022333-0000/0*
{code:java}
Spark Executor Command: "/opt/bitnami/java/bin/java" "-cp" 
"/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx512M" 
"-Dspark.driver.port=13333" "-XX:+IgnoreUnrecognizedVMOptions" 
"--add-opens=java.base/java.lang=ALL-UNNAMED" 
"--add-opens=java.base/java.lang.invoke=ALL-UNNAMED" 
"--add-opens=java.base/java.lang.reflect=ALL-UNNAMED" 
"--add-opens=java.base/java.io=ALL-UNNAMED" 
"--add-opens=java.base/java.net=ALL-UNNAMED" 
"--add-opens=java.base/java.nio=ALL-UNNAMED" 
"--add-opens=java.base/java.util=ALL-UNNAMED" 
"--add-opens=java.base/java.util.concurrent=ALL-UNNAMED" 
"--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED" 
"--add-opens=java.base/sun.nio.ch=ALL-UNNAMED" 
"--add-opens=java.base/sun.nio.cs=ALL-UNNAMED" 
"--add-opens=java.base/sun.security.action=ALL-UNNAMED" 
"--add-opens=java.base/sun.util.calendar=ALL-UNNAMED" 
"--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED" 
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
"spark://[email protected]:13333" "--executor-id" "0" 
"--hostname" "spark-worker-1" "--cores" "1" "--app-id" 
"app-20221003022333-0000" "--worker-url" "spark://Worker@spark-worker-1:6061"
========================================

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
        at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:424)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:413)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.rpc.RpcTimeoutException: Futures timed out after 
[120 seconds]. This timeout is controlled by spark.rpc.lookupTimeout
        at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
        at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:444)
        at 
scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
        at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
        at scala.collection.immutable.Range.foreach(Range.scala:158)
        at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:442)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        ... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 
seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259)
        at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263)
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:293)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        ... 16 more {code}
 

 
h1.  *[|http://spark-worker-1:18081/]* stdout log page for 
app-20221003022333-0000/0

 
{code:java}
22/10/03 02:23:35 INFO CoarseGrainedExecutorBackend: Started daemon with 
process name: 107@spark-worker-1
22/10/03 02:23:35 INFO SignalUtils: Registering signal handler for TERM
22/10/03 02:23:35 INFO SignalUtils: Registering signal handler for HUP
22/10/03 02:23:35 INFO SignalUtils: Registering signal handler for INT
22/10/03 02:23:35 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
22/10/03 02:23:35 INFO SecurityManager: Changing view acls to: spark,mike
22/10/03 02:23:35 INFO SecurityManager: Changing modify acls to: spark,mike
22/10/03 02:23:35 INFO SecurityManager: Changing view acls groups to: 
22/10/03 02:23:35 INFO SecurityManager: Changing modify acls groups to: 
22/10/03 02:23:35 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(spark, mike); 
groups with view permissions: Set(); users  with modify permissions: Set(spark, 
mike); groups with modify permissions: Set()
22/10/03 02:25:35 ERROR RpcOutboxMessage: Ask terminated before connecting 
successfully
22/10/03 02:25:35 WARN NettyRpcEnv: Ignored failure: java.io.IOException: 
Connecting to /192.168.31.31:13333 timed out (120000 ms)
22/10/03 02:27:35 ERROR RpcOutboxMessage: Ask terminated before connecting 
successfully
22/10/03 02:27:35 WARN NettyRpcEnv: Ignored failure: java.io.IOException: 
Connecting to /192.168.31.31:13333 timed out (120000 ms) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to