Stavros Kontopoulos created SPARK-16379:
-------------------------------------------

             Summary: Spark on mesos is broken due to race condition in Logging
                 Key: SPARK-16379
                 URL: https://issues.apache.org/jira/browse/SPARK-16379
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.0
            Reporter: Stavros Kontopoulos
            Priority: Blocker


This commit introduced a transient log val: 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec

This has caused problems in the past:
https://github.com/apache/spark/pull/1004

One commit before that everything works fine.

I spotted that when my CI started to fail:
https://ci.typesafe.com/job/mit-docker-test-ref/191/

You can easily verify it by installing mesos on your machine and try to connect 
with spark shell from bin dir:

./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz

It gets stuck at the point where it tries to create the SparkContext.

Logging gets stuck here:
I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
'/mesos/json.info_0000000152' in ZooKeeper
I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
([email protected]:5050) is detected
I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
[email protected]:5050
I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. Attempting 
to register without authentication
I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
13553f8b-f42c-4f20-88cd-16f1cc153ede-0001

I verified it also by changing @transient val log to def and it works as 
expected.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to