Farhan Khan created SPARK-33199:
-----------------------------------
Summary: Mesos Task Failed when pyFiles and docker image option
used together
Key: SPARK-33199
URL: https://issues.apache.org/jira/browse/SPARK-33199
Project: Spark
Issue Type: Bug
Components: Mesos
Affects Versions: 3.0.1
Reporter: Farhan Khan
It is a bug related to shell escaping in MesosClusterScheduler. As you can see
in this issue SPARK-23941, we are escaping all the options, that lead to
escaping of the *$MESOS_SANDBOX* used in pyFiles in case of docker image option
as well. This in turn causes *FileNotFoundErrors* during
*py4j.SparkSession.getOrCreate*.
In order to reproduce the bug, you need to have mesos and mesos dispatcher
running. When you launch the following program:
{code:bash}
spark-submit --master mesos://127.0.1.1:7077 --deploy-mode cluster --class
my_python_program.py --py-files /path/to/PySparkInclude.py --name
my_python_program --conf spark.mesos.executor.docker.image=myrepo/myimage:mytag
/path/to/my_python_program.py 100
{code}
then the task failed with the following output:
{code:bash}
I1020 16:26:11.958860 17736 fetcher.cpp:562] Fetcher Info:
{"cache_directory":"/tmp/mesos/fetch/nobody","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"/path/to/my_python_program.py"}},{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"/path/to/PySparkTestInclude.py"}}],"sandbox_directory":"/var/lib/mesos/slave/slaves/746d6707-8689-4698-9b93-7368ad523421-S6/frameworks/746d6707-8689-4698-9b93-7368ad523421-0016/executors/driver-20201020162611-0012/runs/f98449ad-ecba-4bc9-a979-99f21f227375","stall_timeout":{"nanoseconds":60000000000},"user":"nobody"}
I1020 16:26:11.962540 17736 fetcher.cpp:459] Fetching URI
'/path/to/my_python_program.py'
I1020 16:26:11.962558 17736 fetcher.cpp:290] Fetching
'/path/to/my_python_program.py' directly into the sandbox directory
I1020 16:26:11.962584 17736 fetcher.cpp:175] Downloading resource from
'/path/to/my_python_program.py' to
'/var/lib/mesos/slave/slaves/746d6707-8689-4698-9b93-7368ad523421-S6/frameworks/746d6707-8689-4698-9b93-7368ad523421-0016/executors/driver-20201020162611-0012/runs/f98449ad-ecba-4bc9-a979-99f21f227375/pi_with_include.py'
W1020 16:26:12.126960 17736 fetcher.cpp:332] Copying instead of extracting
resource from URI with 'extract' flag, because it does not seem to be an
archive: /path/to/my_python_program.py
I1020 16:26:12.126986 17736 fetcher.cpp:618] Fetched
'/path/to/my_python_program.py' to
'/var/lib/mesos/slave/slaves/746d6707-8689-4698-9b93-7368ad523421-S6/frameworks/746d6707-8689-4698-9b93-7368ad523421-0016/executors/driver-20201020162611-0012/runs/f98449ad-ecba-4bc9-a979-99f21f227375/pi_with_include.py'
I1020 16:26:12.126991 17736 fetcher.cpp:459] Fetching URI
'/path/to/PySparkTestInclude.py'
I1020 16:26:12.126996 17736 fetcher.cpp:290] Fetching
'/path/to/PySparkTestInclude.py' directly into the sandbox directory
I1020 16:26:12.127013 17736 fetcher.cpp:175] Downloading resource from
'/path/to/PySparkTestInclude.py' to
'/var/lib/mesos/slave/slaves/746d6707-8689-4698-9b93-7368ad523421-S6/frameworks/746d6707-8689-4698-9b93-7368ad523421-0016/executors/driver-20201020162611-0012/runs/f98449ad-ecba-4bc9-a979-99f21f227375/PySparkTestInclude.py'
W1020 16:26:12.283085 17736 fetcher.cpp:332] Copying instead of extracting
resource from URI with 'extract' flag, because it does not seem to be an
archive: /path/to/PySparkTestInclude.py
I1020 16:26:12.283108 17736 fetcher.cpp:618] Fetched
'/path/to/PySparkTestInclude.py' to
'/var/lib/mesos/slave/slaves/746d6707-8689-4698-9b93-7368ad523421-S6/frameworks/746d6707-8689-4698-9b93-7368ad523421-0016/executors/driver-20201020162611-0012/runs/f98449ad-ecba-4bc9-a979-99f21f227375/PySparkTestInclude.py'
I1020 16:26:12.283114 17736 fetcher.cpp:623] Successfully fetched all URIs into
'/var/lib/mesos/slave/slaves/746d6707-8689-4698-9b93-7368ad523421-S6/frameworks/746d6707-8689-4698-9b93-7368ad523421-0016/executors/driver-20201020162611-0012/runs/f98449ad-ecba-4bc9-a979-99f21f227375'
Marked '/' as rslave
W1020 16:26:12.389969 2 openssl.cpp:553] Failed SSL connections will be
downgraded to a non-SSL socket
W1020 16:26:12.390067 2 openssl.cpp:575] TLS certificate verification was
enabled by setting one of LIBPROCESS_SSL_VERIFY_CERT or
LIBPROCESS_SSL_REQUIRE_CERT, but can be bypassed because TLS downgrades are
enabled.
I1020 16:26:12.390079 2 openssl.cpp:599] CA directory path unspecified!
NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=<dirpath>
I1020 16:26:12.390087 2 openssl.cpp:623] Will use IP address verification
in subject alternative name certificate extension.
I1020 16:26:12.390095 2 openssl.cpp:642] Will verify server certificates
for outgoing TLS connections.
I1020 16:26:12.390105 2 openssl.cpp:663] Using 'openssl' scheme for
hostname validation
I1020 16:26:12.390331 2 openssl.cpp:688] Using CA file:
/var/lib/mesos/slave/slaves/746d6707-8689-4698-9b93-7368ad523421-S6/frameworks/746d6707-8689-4698-9b93-7368ad523421-0016/executors/driver-20201020162611-0012/runs/f98449ad-ecba-4bc9-a979-99f21f227375/.ssl/ca-bundle.crt
W1020 16:26:12.390651 2 process.cpp:1055] Failed SSL connections will be
downgraded to a non-SSL socket
I1020 16:26:12.392330 2 executor.cpp:210] Version: 1.10.1
I1020 16:26:12.392472 2 executor.cpp:286] Using URL
http+unix://%2Fvar%2Frun%2Fmesos%2Fmesos-executors.sock/slave(1)/api/v1/executor
for the executor API endpoint
I1020 16:26:12.398231 8 executor.cpp:190] Received SUBSCRIBED event
I1020 16:26:12.398533 8 executor.cpp:194] Subscribed executor on
172.16.72.91
I1020 16:26:12.398721 8 executor.cpp:190] Received LAUNCH event
I1020 16:26:12.399922 8 executor.cpp:671] Overwriting environment variable
'LIBPROCESS_SSL_CA_FILE'
I1020 16:26:12.399947 8 executor.cpp:671] Overwriting environment variable
'LIBPROCESS_SSL_CERT_FILE'
I1020 16:26:12.399964 8 executor.cpp:671] Overwriting environment variable
'MESOS_NATIVE_JAVA_LIBRARY'
I1020 16:26:12.399986 8 executor.cpp:671] Overwriting environment variable
'LIBPROCESS_SSL_KEY_FILE'
I1020 16:26:12.400000 8 executor.cpp:671] Overwriting environment variable
'PATH'
I1020 16:26:12.400053 8 executor.cpp:722] Starting task
driver-20201020162611-0012
I1020 16:26:12.401970 8 executor.cpp:740] Forked command at 12
I1020 16:26:12.403370 9 executor.cpp:190] Received ACKNOWLEDGED event
I1020 16:26:12.404062 9 executor.cpp:190] Received ACKNOWLEDGED event
Preparing rootfs at
'/var/lib/mesos/slave/provisioner/containers/f98449ad-ecba-4bc9-a979-99f21f227375/backends/overlay/rootfses/f80c3e69-73e8-4154-a8e5-8c68613f6462'
Marked '/' as rslave
Changing root to
/var/lib/mesos/slave/provisioner/containers/f98449ad-ecba-4bc9-a979-99f21f227375/backends/overlay/rootfses/f80c3e69-73e8-4154-a8e5-8c68613f6462
spark-env: Configured SPARK_LOCAL_IP with bootstrap: 172.16.72.91
spark-env: User: nobody
spark-env: No SPARK_MESOS_KRB5_CONF_BASE64 to decode
spark-env: Configuring StatsD endpoint 198.51.100.1:46584 in
/opt/spark/conf/metrics.properties
20/10/20 16:26:15 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
20/10/20 16:26:17 INFO SparkContext: Running Spark version 3.0.1
20/10/20 16:26:17 INFO ResourceUtils:
==============================================================
20/10/20 16:26:17 INFO ResourceUtils: Resources for spark.driver:
20/10/20 16:26:17 INFO ResourceUtils:
==============================================================
20/10/20 16:26:17 INFO SparkContext: Submitted application: PythonPi
20/10/20 16:26:17 INFO SecurityManager: Changing view acls to: nobody
20/10/20 16:26:17 INFO SecurityManager: Changing modify acls to: nobody
20/10/20 16:26:17 INFO SecurityManager: Changing view acls groups to:
20/10/20 16:26:17 INFO SecurityManager: Changing modify acls groups to:
20/10/20 16:26:17 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(nobody); groups
with view permissions: Set(); users with modify permissions: Set(nobody);
groups with modify permissions: Set()
20/10/20 16:26:17 INFO Utils: Successfully started service 'sparkDriver' on
port 37759.
20/10/20 16:26:17 INFO SparkEnv: Registering MapOutputTracker
20/10/20 16:26:18 INFO SparkEnv: Registering BlockManagerMaster
20/10/20 16:26:18 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/10/20 16:26:18 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/10/20 16:26:18 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/10/20 16:26:18 INFO DiskBlockManager: Created local directory at
/mnt/mesos/sandbox/blockmgr-2be3fdd6-db33-49e8-bc0a-5b1da5ec8ded
20/10/20 16:26:18 INFO MemoryStore: MemoryStore started with capacity 912.3 MiB
20/10/20 16:26:18 INFO MetricsConfig: Loading metrics properties from resource
metrics.properties
20/10/20 16:26:18 INFO MetricsConfig: Metrics properties:
{*.sink.statsd.class=org.apache.spark.metrics.sink.statsd.StatsdSink,
*.sink.statsd.host=198.51.100.1,
*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet,
applications.sink.servlet.path=/metrics/applications/json, *.sink.statsd.tags=,
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource,
*.sink.statsd.port=46584, master.sink.servlet.path=/metrics/master/json,
*.sink.statsd.prefix=, *.sink.servlet.path=/metrics/json}
20/10/20 16:26:18 INFO SparkEnv: Registering OutputCommitCoordinator
20/10/20 16:26:18 INFO log: Logging initialized @5603ms to
org.sparkproject.jetty.util.log.Slf4jLog
20/10/20 16:26:18 INFO Server: jetty-9.4.z-SNAPSHOT; built:
2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm
1.8.0_212-b03
20/10/20 16:26:18 INFO Server: Started @5872ms
20/10/20 16:26:18 INFO AbstractConnector: Started
ServerConnector@1345a271{HTTP/1.1,[http/1.1]}{172.16.72.91:4040}
20/10/20 16:26:18 INFO Utils: Successfully started service 'SparkUI' on port
4040.
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@37925df7{/jobs,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@31660e32{/jobs/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@68c3022c{/jobs/job,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@9662252{/jobs/job/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@75e43d99{/stages,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@74c915ac{/stages/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@7d15bd53{/stages/stage,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@10d10bcf{/stages/stage/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@6ce43fb0{/stages/pool,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@42f60f93{/stages/pool/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@228a2392{/storage,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@50576ab{/storage/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@6de665b4{/storage/rdd,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@5fb35a9d{/storage/rdd/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@5bb2641{/environment,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@5f7aae9d{/environment/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@29b1337f{/executors,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@7af3cf55{/executors/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@b196f0b{/executors/threadDump,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@2383aabe{/executors/threadDump/json,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@545844d3{/static,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@1adf4e59{/,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@2ae8fb6c{/api,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@21e2f70e{/jobs/job/kill,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@6199fb32{/stages/stage/kill,null,AVAILABLE,@Spark}
20/10/20 16:26:18 INFO SparkUI: Bound SparkUI to 172.16.72.91, and started at
http://172.16.72.91:4040
20/10/20 16:26:18 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File
file:/mnt/mesos/sandbox/$MESOS_SANDBOX/PySparkTestInclude.py does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:666)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:987)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:656)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1534)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1498)
at org.apache.spark.SparkContext.$anonfun$new$12(SparkContext.scala:494)
at
org.apache.spark.SparkContext.$anonfun$new$12$adapted(SparkContext.scala:494)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:494)
at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
20/10/20 16:26:18 INFO AbstractConnector: Stopped
Spark@1345a271{HTTP/1.1,[http/1.1]}{172.16.72.91:4040}
20/10/20 16:26:18 INFO SparkUI: Stopped Spark web UI at http://172.16.72.91:4040
20/10/20 16:26:19 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
20/10/20 16:26:19 INFO MemoryStore: MemoryStore cleared
20/10/20 16:26:19 INFO BlockManager: BlockManager stopped
20/10/20 16:26:19 INFO BlockManagerMaster: BlockManagerMaster stopped
20/10/20 16:26:19 WARN MetricsSystem: Stopping a MetricsSystem that is not
running
20/10/20 16:26:19 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
20/10/20 16:26:19 INFO SparkContext: Successfully stopped SparkContext
20/10/20 16:26:19 INFO ShutdownHookManager: Shutdown hook called
20/10/20 16:26:19 INFO ShutdownHookManager: Shutdown hook called
20/10/20 16:26:19 INFO ShutdownHookManager: Deleting directory
/tmp/spark-07db5724-6031-45d3-8222-c06947f80872
20/10/20 16:26:19 INFO ShutdownHookManager: Deleting directory
/mnt/mesos/sandbox/spark-d4145378-f0da-4f1b-8aee-a72d427ccb8f
I1020 16:26:19.363329 3 executor.cpp:1041] Command exited with status 1
(pid: 12)
I1020 16:26:19.366904 4 executor.cpp:190] Received ACKNOWLEDGED event
I1020 16:26:19.367244 2 process.cpp:935] Stopped the socket accept loop
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]