[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363289#comment-15363289 ]
Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:34 PM: ----------------------------------------------------------------- I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html | One other thing i hope it holds is no new commit should break the project even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. was (Author: mgummelt): I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html > One other thing i hope it holds is no new commit should break the project > even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. > Spark on mesos is broken due to race condition in Logging > --------------------------------------------------------- > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.0.0 > Reporter: Stavros Kontopoulos > Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_0000000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org