Dongjoon Hyun created SPARK-33681:
-------------------------------------

             Summary: Increase K8s IT timeout
                 Key: SPARK-33681
                 URL: https://issues.apache.org/jira/browse/SPARK-33681
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes, Tests
    Affects Versions: 2.4.7
            Reporter: Dongjoon Hyun


- 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36905/console
{code}
- Run PySpark with memory customization *** FAILED ***
  The code passed to eventually never returned normally. Attempted 70 times 
over 2.018373577433333 minutes. Last failure message: "++ id -u
  + myuid=0
  ++ id -g
  + mygid=0
  + set +e
  ++ getent passwd 0
  + uidentry=root:x:0:0:root:/root:/bin/bash
  + set -e
  + '[' -z root:x:0:0:root:/root:/bin/bash ']'
  + SPARK_K8S_CMD=driver-py
  + case "$SPARK_K8S_CMD" in
  + shift 1
  + SPARK_CLASSPATH=':/opt/spark/jars/*'
  + env
  + sort -t_ -k4 -n
  + sed 's/[^=]*=\(.*\)/\1/g'
  + grep SPARK_JAVA_OPT_
  + readarray -t SPARK_EXECUTOR_JAVA_OPTS
  + '[' -n '' ']'
  + '[' -n /opt/spark/tests/py_container_checks.py ']'
  + 
PYTHONPATH='/opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-*.zip:/opt/spark/tests/py_container_checks.py'
  + PYSPARK_ARGS=
  + '[' -n 209715200 ']'
  + PYSPARK_ARGS=209715200
  + R_ARGS=
  + '[' -n '' ']'
  + '[' 3 == 2 ']'
  + '[' 3 == 3 ']'
  ++ python3 -V
  + pyv3='Python 3.7.3'
  + export PYTHON_VERSION=3.7.3
  + PYTHON_VERSION=3.7.3
  + export PYSPARK_PYTHON=python3
  + PYSPARK_PYTHON=python3
  + export PYSPARK_DRIVER_PYTHON=python3
  + PYSPARK_DRIVER_PYTHON=python3
  + '[' -n '' ']'
  + '[' -z ']'
  + case "$SPARK_K8S_CMD" in
  + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@" 
$PYSPARK_PRIMARY $PYSPARK_ARGS)
  + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
/opt/spark/tests/worker_memory_check.py 209715200
  20/12/07 00:09:32 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
  Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
  20/12/07 00:09:33 INFO SparkContext: Running Spark version 2.4.8-SNAPSHOT
  20/12/07 00:09:33 INFO SparkContext: Submitted application: PyMemoryTest
  20/12/07 00:09:33 INFO SecurityManager: Changing view acls to: root
  20/12/07 00:09:33 INFO SecurityManager: Changing modify acls to: root
  20/12/07 00:09:33 INFO SecurityManager: Changing view acls groups to: 
  20/12/07 00:09:33 INFO SecurityManager: Changing modify acls groups to: 
  20/12/07 00:09:33 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set(); users  with modify permissions: Set(root); groups 
with modify permissions: Set()
  20/12/07 00:09:34 INFO Utils: Successfully started service 'sparkDriver' on 
port 7078.
  20/12/07 00:09:34 INFO SparkEnv: Registering MapOutputTracker
  20/12/07 00:09:34 INFO SparkEnv: Registering BlockManagerMaster
  20/12/07 00:09:34 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
  20/12/07 00:09:34 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint 
up
  20/12/07 00:09:34 INFO DiskBlockManager: Created local directory at 
/var/data/spark-9950d0f1-8753-441f-97cf-1aa6defd1d0e/blockmgr-9f6bcf4d-ff41-4b27-8312-0fb23bf4ed1b
  20/12/07 00:09:34 INFO MemoryStore: MemoryStore started with capacity 546.3 MB
  20/12/07 00:09:34 INFO SparkEnv: Registering OutputCommitCoordinator
  20/12/07 00:09:34 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
  20/12/07 00:09:34 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:4040
  20/12/07 00:09:34 INFO SparkContext: Added file 
file:///opt/spark/tests/worker_memory_check.py at 
spark://spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7078/files/worker_memory_check.py
 with timestamp 1607299774831
  20/12/07 00:09:34 INFO Utils: Copying /opt/spark/tests/worker_memory_check.py 
to 
/var/data/spark-9950d0f1-8753-441f-97cf-1aa6defd1d0e/spark-8ae1ff4f-2989-43d3-adfe-f26e8ff71ed2/userFiles-cfe3880e-6803-4809-9c01-6f1f582e4481/worker_memory_check.py
  20/12/07 00:09:34 INFO SparkContext: Added file 
file:///opt/spark/tests/py_container_checks.py at 
spark://spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7078/files/py_container_checks.py
 with timestamp 1607299774847
  20/12/07 00:09:34 INFO Utils: Copying /opt/spark/tests/py_container_checks.py 
to 
/var/data/spark-9950d0f1-8753-441f-97cf-1aa6defd1d0e/spark-8ae1ff4f-2989-43d3-adfe-f26e8ff71ed2/userFiles-cfe3880e-6803-4809-9c01-6f1f582e4481/py_container_checks.py
  20/12/07 00:09:36 INFO ExecutorPodsAllocator: Going to request 1 executors 
from Kubernetes.
  20/12/07 00:09:36 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
  20/12/07 00:09:36 INFO NettyBlockTransferService: Server created on 
spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7079
  20/12/07 00:09:36 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
  20/12/07 00:09:36 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, 
spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 
7079, None)
  20/12/07 00:09:36 INFO BlockManagerMasterEndpoint: Registering block manager 
spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7079
 with 546.3 MB RAM, BlockManagerId(driver, 
spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 
7079, None)
  20/12/07 00:09:36 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 
spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 
7079, None)
  20/12/07 00:09:36 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, 
spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 
7079, None)
  20/12/07 00:10:06 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is 
ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 
30000(ms)
  20/12/07 00:10:06 INFO SharedState: Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir 
('file:/opt/spark/work-dir/spark-warehouse').
  20/12/07 00:10:06 INFO SharedState: Warehouse path is 
'file:/opt/spark/work-dir/spark-warehouse'.
  20/12/07 00:10:07 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
  20/12/07 00:10:07 INFO SparkContext: Starting job: collect at 
/opt/spark/tests/worker_memory_check.py:43
  20/12/07 00:10:07 INFO DAGScheduler: Got job 0 (collect at 
/opt/spark/tests/worker_memory_check.py:43) with 2 output partitions
  20/12/07 00:10:07 INFO DAGScheduler: Final stage: ResultStage 0 (collect at 
/opt/spark/tests/worker_memory_check.py:43)
  20/12/07 00:10:07 INFO DAGScheduler: Parents of final stage: List()
  20/12/07 00:10:07 INFO DAGScheduler: Missing parents: List()
  20/12/07 00:10:07 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] 
at collect at /opt/spark/tests/worker_memory_check.py:43), which has no missing 
parents
  20/12/07 00:10:07 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 4.5 KB, free 546.3 MB)
  20/12/07 00:10:07 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
in memory (estimated size 3.1 KB, free 546.3 MB)
  20/12/07 00:10:07 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 
spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7079
 (size: 3.1 KB, free: 546.3 MB)
  20/12/07 00:10:07 INFO SparkContext: Created broadcast 0 from broadcast at 
DAGScheduler.scala:1184
  20/12/07 00:10:08 INFO DAGScheduler: Submitting 2 missing tasks from 
ResultStage 0 (PythonRDD[1] at collect at 
/opt/spark/tests/worker_memory_check.py:43) (first 15 tasks are for partitions 
Vector(0, 1))
  20/12/07 00:10:08 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
  20/12/07 00:10:23 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
  20/12/07 00:10:38 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
  20/12/07 00:10:53 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
  20/12/07 00:11:08 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
  20/12/07 00:11:23 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
  20/12/07 00:11:38 WARN TaskSchedulerImpl: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources
  " did not contain "PySpark Worker Memory Check is: True" The application did 
not complete.. (KubernetesSuite.scala:249)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to