Andrew is your issue also a regression from 1.0.0 to 1.0.1? The immediate priority is addressing regressions between these two releases.
On Mon, Jul 14, 2014 at 9:05 PM, Andrew Ash <and...@andrewash.com> wrote: > I'm not sure either of those PRs will fix the concurrent adds to > Configuration issue I observed. I've got a stack trace and writeup I'll > share in an hour or two (traveling today). > On Jul 14, 2014 9:50 PM, "scwf" <wangf...@huawei.com> wrote: > >> hi,Cody >> i met this issue days before and i post a PR for this( >> https://github.com/apache/spark/pull/1385) >> it's very strange that if i synchronize conf it will deadlock but it is ok >> when synchronize initLocalJobConfFuncOpt >> >> >> Here's the entire jstack output. >>> >>> >>> On Mon, Jul 14, 2014 at 4:44 PM, Patrick Wendell <pwend...@gmail.com >>> <mailto:pwend...@gmail.com>> wrote: >>> >>> Hey Cody, >>> >>> This Jstack seems truncated, would you mind giving the entire stack >>> trace? For the second thread, for instance, we can't see where the >>> lock is being acquired. >>> >>> - Patrick >>> >>> On Mon, Jul 14, 2014 at 1:42 PM, Cody Koeninger >>> <cody.koenin...@mediacrossing.com <mailto:cody.koeninger@ >>> mediacrossing.com>> wrote: >>> > Hi all, just wanted to give a heads up that we're seeing a >>> reproducible >>> > deadlock with spark 1.0.1 with 2.3.0-mr1-cdh5.0.2 >>> > >>> > If jira is a better place for this, apologies in advance - figured >>> talking >>> > about it on the mailing list was friendlier than randomly >>> (re)opening jira >>> > tickets. >>> > >>> > I know Gary had mentioned some issues with 1.0.1 on the mailing >>> list, once >>> > we got a thread dump I wanted to follow up. >>> > >>> > The thread dump shows the deadlock occurs in the synchronized >>> block of code >>> > that was changed in HadoopRDD.scala, for the Spark-1097 issue >>> > >>> > Relevant portions of the thread dump are summarized below, we can >>> provide >>> > the whole dump if it's useful. >>> > >>> > Found one Java-level deadlock: >>> > ============================= >>> > "Executor task launch worker-1": >>> > waiting to lock monitor 0x00007f250400c520 (object >>> 0x00000000fae7dc30, a >>> > org.apache.hadoop.co <http://org.apache.hadoop.co> >>> > nf.Configuration), >>> > which is held by "Executor task launch worker-0" >>> > "Executor task launch worker-0": >>> > waiting to lock monitor 0x00007f2520495620 (object >>> 0x00000000faeb4fc8, a >>> > java.lang.Class), >>> > which is held by "Executor task launch worker-1" >>> > >>> > >>> > "Executor task launch worker-1": >>> > at >>> > org.apache.hadoop.conf.Configuration.reloadConfiguration( >>> Configuration.java:791) >>> > - waiting to lock <0x00000000fae7dc30> (a >>> > org.apache.hadoop.conf.Configuration) >>> > at >>> > org.apache.hadoop.conf.Configuration.addDefaultResource( >>> Configuration.java:690) >>> > - locked <0x00000000faca6ff8> (a java.lang.Class for >>> > org.apache.hadoop.conf.Configurati >>> > on) >>> > at >>> > org.apache.hadoop.hdfs.HdfsConfiguration.<clinit>( >>> HdfsConfiguration.java:34) >>> > at >>> > org.apache.hadoop.hdfs.DistributedFileSystem.<clinit> >>> (DistributedFileSystem.java:110 >>> > ) >>> > at sun.reflect.NativeConstructorAccessorImpl. >>> newInstance0(Native >>> > Method) >>> > at >>> > sun.reflect.NativeConstructorAccessorImpl.newInstance( >>> NativeConstructorAccessorImpl. >>> > java:57) >>> > at sun.reflect.NativeConstructorAccessorImpl. >>> newInstance0(Native >>> > Method) >>> > at >>> > sun.reflect.NativeConstructorAccessorImpl.newInstance( >>> NativeConstructorAccessorImpl. >>> > java:57) >>> > at >>> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance( >>> DelegatingConstructorAcces >>> > sorImpl.java:45) >>> > at java.lang.reflect.Constructor. >>> newInstance(Constructor.java:525) >>> > at java.lang.Class.newInstance0(Class.java:374) >>> > at java.lang.Class.newInstance(Class.java:327) >>> > at java.util.ServiceLoader$LazyIterator.next( >>> ServiceLoader.java:373) >>> > at java.util.ServiceLoader$1.next(ServiceLoader.java:445) >>> > at >>> > org.apache.hadoop.fs.FileSystem.loadFileSystems( >>> FileSystem.java:2364) >>> > - locked <0x00000000faeb4fc8> (a java.lang.Class for >>> > org.apache.hadoop.fs.FileSystem) >>> > at >>> > org.apache.hadoop.fs.FileSystem.getFileSystemClass( >>> FileSystem.java:2375) >>> > at >>> > org.apache.hadoop.fs.FileSystem.createFileSystem( >>> FileSystem.java:2392) >>> > at org.apache.hadoop.fs.FileSystem.access$200( >>> FileSystem.java:89) >>> > at >>> > org.apache.hadoop.fs.FileSystem$Cache.getInternal( >>> FileSystem.java:2431) >>> > at org.apache.hadoop.fs.FileSystem$Cache.get( >>> FileSystem.java:2413) >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. >>> java:368) >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. >>> java:167) >>> > at >>> > org.apache.hadoop.mapred.JobConf.getWorkingDirectory( >>> JobConf.java:587) >>> > at >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( >>> FileInputFormat.java:315) >>> > at >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( >>> FileInputFormat.java:288) >>> > at >>> > org.apache.spark.SparkContext$$anonfun$22.apply( >>> SparkContext.scala:546) >>> > at >>> > org.apache.spark.SparkContext$$anonfun$22.apply( >>> SparkContext.scala:546) >>> > at >>> > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$ >>> 1.apply(HadoopRDD.scala:145) >>> > >>> > >>> > >>> > ...elided... >>> > >>> > >>> > "Executor task launch worker-0" daemon prio=10 >>> tid=0x0000000001e71800 >>> > nid=0x2d97 waiting for monitor entry [0x00007f24d2bf1000] >>> > java.lang.Thread.State: BLOCKED (on object monitor) >>> > at >>> > org.apache.hadoop.fs.FileSystem.loadFileSystems( >>> FileSystem.java:2362) >>> > - waiting to lock <0x00000000faeb4fc8> (a java.lang.Class >>> for >>> > org.apache.hadoop.fs.FileSystem) >>> > at >>> > org.apache.hadoop.fs.FileSystem.getFileSystemClass( >>> FileSystem.java:2375) >>> > at >>> > org.apache.hadoop.fs.FileSystem.createFileSystem( >>> FileSystem.java:2392) >>> > at org.apache.hadoop.fs.FileSystem.access$200( >>> FileSystem.java:89) >>> > at >>> > org.apache.hadoop.fs.FileSystem$Cache.getInternal( >>> FileSystem.java:2431) >>> > at org.apache.hadoop.fs.FileSystem$Cache.get( >>> FileSystem.java:2413) >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. >>> java:368) >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. >>> java:167) >>> > at >>> > org.apache.hadoop.mapred.JobConf.getWorkingDirectory( >>> JobConf.java:587) >>> > at >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( >>> FileInputFormat.java:315) >>> > at >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( >>> FileInputFormat.java:288) >>> > at >>> > org.apache.spark.SparkContext$$anonfun$22.apply( >>> SparkContext.scala:546) >>> > at >>> > org.apache.spark.SparkContext$$anonfun$22.apply( >>> SparkContext.scala:546) >>> > at >>> > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$ >>> 1.apply(HadoopRDD.scala:145) >>> >>> >>> >> >> -- >> >> Best Regards >> Fei Wang >> >> ------------------------------------------------------------ >> -------------------- >> >> >>