Patrick: I don't think this was caused by a recent merge -- pretty sure I was seeing it last week.
Mark: Are you sure the examples assembly is hanging, as opposed to just taking a long time? It takes ~30 minutes on my machine (not doubting that the Java version update fixes it -- just pointing out that if you wait, it may actually finish). Evan: One thing to note is that the log message is wrong (see https://github.com/apache/incubator-spark/pull/126): the task is actually failing just once, not 4 times. Doesn't help fix the issue -- but just thought I'd point it out in case anyone else is trying to look into this. On Wed, Oct 30, 2013 at 2:08 PM, Patrick Wendell <pwend...@gmail.com> wrote: > This may have been caused by a recent merge since a bunch of people > independently hit it in the last 48 hours. > > One debugging step would be to narrow it down to which merge caused > it. I don't have time personally today, but just a suggestion for ppl > for whom this is blocking progress. > > - Patrick > > On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra <m...@clearstorydata.com> > wrote: > > What JDK version on you using, Evan? > > > > I tried to reproduce your problem earlier today, but I wasn't even able > to > > get through the assembly build -- kept hanging when trying to build the > > examples assembly. Foregoing the assembly and running the tests would > hang > > on FileServerSuite "Dynamically adding JARS locally" -- no stack trace, > > just hung. And I was actually seeing a very similar stack trace to yours > > from a test suite of our own running against 0.8.1-SNAPSHOT -- not > exactly > > the same because line numbers were different once it went into the java > > runtime, and it eventually ended up someplace a little different. That > got > > me curious about differences in Java versions, so I updated to the latest > > Oracle release (1.7.0_45). Now it cruises right through the build and > test > > of Spark master from before Matei merged your PR. Then I logged into a > > machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually) > > installed, and I'm right back to the hanging during the examples assembly > > (but passes FileServerSuite, oddly enough.) Upgrading the JDK didn't > > improve the results of the ClearStory test suite I was looking at, so my > > misery isn't over; but yours might be with a newer JDK.... > > > > > > > > On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan <e...@ooyala.com> wrote: > > > >> Must be a local environment thing, because AmpLab Jenkins can't > >> reproduce it..... :-p > >> > >> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen <rosenvi...@gmail.com> > wrote: > >> > Someone on the users list also encountered this exception: > >> > > >> > > >> > https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E > >> > > >> > > >> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <e...@ooyala.com> wrote: > >> > > >> >> I'm at the latest > >> >> > >> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05 > >> >> Merge: aec9bf9 a197137 > >> >> Author: Reynold Xin <r...@apache.org> > >> >> Date: Tue Oct 29 01:41:44 2013 -0400 > >> >> > >> >> > >> >> and seeing this when I do a "test-only FileServerSuite": > >> >> > >> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0) > >> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to > >> >> java.io.StreamCorruptedException > >> >> java.io.StreamCorruptedException: invalid type code: AC > >> >> at > >> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) > >> >> at > >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) > >> >> at > >> >> > >> > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39) > >> >> at > >> >> > >> > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101) > >> >> at > >> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) > >> >> at > >> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440) > >> >> at > >> >> > >> > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26) > >> >> at > >> >> > >> > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27) > >> >> at > >> >> > org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53) > >> >> at > >> >> > >> > org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95) > >> >> at > >> >> > >> > org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94) > >> >> at > >> >> > >> > org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40) > >> >> at > >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237) > >> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226) > >> >> at > >> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107) > >> >> at org.apache.spark.scheduler.Task.run(Task.scala:53) > >> >> at > >> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212) > >> >> at > >> >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > >> >> at > >> >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > >> >> at java.lang.Thread.run(Thread.java:680) > >> >> > >> >> > >> >> Anybody else seen this yet? > >> >> > >> >> I have a really simple PR and this fails without my change, so I may > >> >> go ahead and submit it anyways. > >> >> > >> >> -- > >> >> -- > >> >> Evan Chan > >> >> Staff Engineer > >> >> e...@ooyala.com | > >> >> > >> > >> > >> > >> -- > >> -- > >> Evan Chan > >> Staff Engineer > >> e...@ooyala.com | > >> >