Hi Burak.

I always see this error. I'm running the CDH 5.2 version of Spark 1.1.0. I
load my data from HDFS. By the time it hits the recommender it had gone
through many spark operations.
On Oct 27, 2014 4:03 PM, "Burak Yavuz" <bya...@stanford.edu> wrote:

> Hi,
>
> I've come across this multiple times, but not in a consistent manner. I
> found it hard to reproduce. I have a jira for it: SPARK-3080
>
> Do you observe this error every single time? Where do you load your data
> from? Which version of Spark are you running?
> Figuring out the similarities may help in pinpointing the bug.
>
> Thanks,
> Burak
>
> ----- Original Message -----
> From: "Ilya Ganelin" <ilgan...@gmail.com>
> To: "user" <user@spark.apache.org>
> Sent: Monday, October 27, 2014 11:36:46 AM
> Subject: MLLib ALS ArrayIndexOutOfBoundsException with Scala Spark 1.1.0
>
> Hello all - I am attempting to run MLLib's ALS algorithm on a substantial
> test vector - approx. 200 million records.
>
> I have resolved a few issues I've had with regards to garbage collection,
> KryoSeralization, and memory usage.
>
> I have not been able to get around this issue I see below however:
>
>
> > java.lang.
> > ArrayIndexOutOfBoundsException: 6106
> >
> >
> org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1.apply$mcVI$sp(ALS.
> > scala:543)
> >         scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> >         org.apache.spark.mllib.recommendation.ALS.org
> > $apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:537)
> >
> >
> org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:505)
> >
> >
> org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:504)
> >
> >
> org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
> >
> >
> org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
> >         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> >         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> >
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:144)
> >
> >
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
> >
> >
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)
> >
> >
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> >
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >
>  scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >
> >
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> >         org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
> >         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >
> > org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
> >         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>
>
> I do not have any negative indices or indices that exceed Int-Max.
>
> I have partitioned the input data into 300 partitions and my Spark config
> is below:
>
> .set("spark.executor.memory", "14g")
>       .set("spark.storage.memoryFraction", "0.8")
>       .set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
>       .set("spark.kryo.registrator", "MyRegistrator")
>       .set("spark.core.connection.ack.wait.timeout","600")
>       .set("spark.akka.frameSize","50")
>       .set("spark.yarn.executor.memoryOverhead","1024")
>
> Does anyone have any suggestions as to why i'm seeing the above error or
> how to get around it?
> It may be possible to upgrade to the latest version of Spark but the
> mechanism for doing so in our environment isn't obvious yet.
>
> -Ilya Ganelin
>
>

Reply via email to