bounkong khamphousone created SPARK-23941:
---------------------------------------------

             Summary: Mesos task failed on specific spark app name
                 Key: SPARK-23941
                 URL: https://issues.apache.org/jira/browse/SPARK-23941
             Project: Spark
          Issue Type: Bug
          Components: Mesos, Spark Submit
    Affects Versions: 2.3.0, 2.2.1
         Environment: OS: Ubuntu 16.0.4

Spark: 2.3.0

Mesos: 1.5.0
            Reporter: bounkong khamphousone


It seems to be a bug related to spark's MesosClusterDispatcher. In order to 
reproduce the bug, you need to have mesos and mesos dispatcher running.

I'm currently running mesos 1.5 and spark 2.3.0 (tried with 2.2.1 as well).

If you launch the following program:

 
{code:java}
spark-submit --master mesos://127.0.1.1:7077 --deploy-mode cluster --class 
org.apache.spark.examples.SparkPi --name "my favorite task (myId = 123-456)" 
/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar 100
{code}
, then the task fails with the following output :

 
{code:java}
I0409 11:00:35.360352 22726 fetcher.cpp:551] Fetcher Info: 
{"cache_directory":"\/tmp\/mesos\/fetch\/tiboun","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"\/home\/tiboun\/tools\/spark\/examples\/jars\/spark-examples_2.11-2.3.0.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/0262246c-14a3-4408-9b74-5e3b65dc1344-S0\/frameworks\/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014\/executors\/driver-20180409110035-0004\/runs\/8ac20902-74e1-45c4-9ab6-c52a79940189","user":"tiboun"}
I0409 11:00:35.363119 22726 fetcher.cpp:450] Fetching URI 
'/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
I0409 11:00:35.363143 22726 fetcher.cpp:291] Fetching directly into the sandbox 
directory
I0409 11:00:35.363168 22726 fetcher.cpp:225] Fetching URI 
'/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
W0409 11:00:35.366839 22726 fetcher.cpp:330] Copying instead of extracting 
resource from URI with 'extract' flag, because it does not seem to be an 
archive: /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar
I0409 11:00:35.366873 22726 fetcher.cpp:603] Fetched 
'/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' to 
'/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110035-0004/runs/8ac20902-74e1-45c4-9ab6-c52a79940189/spark-examples_2.11-2.3.0.jar'
I0409 11:00:35.366878 22726 fetcher.cpp:608] Successfully fetched all URIs into 
'/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110035-0004/runs/8ac20902-74e1-45c4-9ab6-c52a79940189'
I0409 11:00:35.438725 22733 exec.cpp:162] Version: 1.5.0
I0409 11:00:35.440770 22734 exec.cpp:236] Executor registered on agent 
0262246c-14a3-4408-9b74-5e3b65dc1344-S0
I0409 11:00:35.441388 22733 executor.cpp:171] Received SUBSCRIBED event
I0409 11:00:35.441586 22733 executor.cpp:175] Subscribed executor on 
tiboun-Dell-Precision-M3800
I0409 11:00:35.441643 22733 executor.cpp:171] Received LAUNCH event
I0409 11:00:35.441767 22733 executor.cpp:638] Starting task 
driver-20180409110035-0004
I0409 11:00:35.445050 22733 executor.cpp:478] Running 
'/usr/libexec/mesos/mesos-containerizer launch <POSSIBLY-SENSITIVE-DATA>'
I0409 11:00:35.445770 22733 executor.cpp:651] Forked command at 22743
sh: 1: Syntax error: "(" unexpected
I0409 11:00:35.538661 22736 executor.cpp:938] Command exited with status 2 
(pid: 22743)
I0409 11:00:36.541016 22739 process.cpp:887] Failed to accept socket: future 
discarded
{code}
If you remove the parentheses, you get the following result:

 
{code:java}
I0409 11:03:02.023701 23085 fetcher.cpp:551] Fetcher Info: 
{"cache_directory":"\/tmp\/mesos\/fetch\/tiboun","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"\/home\/tiboun\/tools\/spark\/examples\/jars\/spark-examples_2.11-2.3.0.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/0262246c-14a3-4408-9b74-5e3b65dc1344-S0\/frameworks\/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014\/executors\/driver-20180409110301-0006\/runs\/f887c0ab-b48f-4382-850c-383c1c944269","user":"tiboun"}
I0409 11:03:02.028268 23085 fetcher.cpp:450] Fetching URI 
'/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
I0409 11:03:02.028302 23085 fetcher.cpp:291] Fetching directly into the sandbox 
directory
I0409 11:03:02.028336 23085 fetcher.cpp:225] Fetching URI 
'/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
W0409 11:03:02.031209 23085 fetcher.cpp:330] Copying instead of extracting 
resource from URI with 'extract' flag, because it does not seem to be an 
archive: /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar
I0409 11:03:02.031250 23085 fetcher.cpp:603] Fetched 
'/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' to 
'/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269/spark-examples_2.11-2.3.0.jar'
I0409 11:03:02.031258 23085 fetcher.cpp:608] Successfully fetched all URIs into 
'/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269'
I0409 11:03:02.090797 23095 exec.cpp:162] Version: 1.5.0
I0409 11:03:02.095283 23092 exec.cpp:236] Executor registered on agent 
0262246c-14a3-4408-9b74-5e3b65dc1344-S0
I0409 11:03:02.096693 23095 executor.cpp:171] Received SUBSCRIBED event
I0409 11:03:02.097040 23095 executor.cpp:175] Subscribed executor on 
tiboun-Dell-Precision-M3800
I0409 11:03:02.097141 23095 executor.cpp:171] Received LAUNCH event
I0409 11:03:02.097357 23095 executor.cpp:638] Starting task 
driver-20180409110301-0006
I0409 11:03:02.101521 23095 executor.cpp:478] Running 
'/usr/libexec/mesos/mesos-containerizer launch <POSSIBLY-SENSITIVE-DATA>'
I0409 11:03:02.102332 23095 executor.cpp:651] Forked command at 23100
Error: Cannot load main class from JAR 
file:/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269/favorite
Run with --help for usage help or --verbose for debug output
I0409 11:03:02.792325 23090 executor.cpp:938] Command exited with status 1 
(pid: 23100)
I0409 11:03:03.794505 23098 process.cpp:887] Failed to accept socket: future 
discarded
{code}
Interesting things is that mesos try to find main class on a file called 
"favorite" which is part of the task name.

 

I've tried to launch spark-shell with the same name and it works fine. Task 
name's get driver's name and add a sequence after it.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to