Hi all,
When selecting large data in sparksql (Select * query) , I see Buffer overflow exception from kryo : 15/03/27 10:32:19 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 3.0 (TID 30, machine159): com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 1, required: 2 Serialization trace: values (org.apache.spark.sql.catalyst.expressions.GenericRow) at com.esotericsoftware.kryo.io.Output.require(Output.java:138) at com.esotericsoftware.kryo.io.Output.writeInt(Output.java:247) at com.esotericsoftware.kryo.serializers.DefaultSerializers$IntSerializer.write(DefaultSerializers.java:95) at com.esotericsoftware.kryo.serializers.DefaultSerializers$IntSerializer.write(DefaultSerializers.java:89) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:167) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) I thought maybe increasing these would resolve the problem, but the same exception is seen : set spark.kryoserializer.buffer.mb=4; set spark.kryoserializer.buffer.max.mb=1024; I have a parquet table with 5 Int columns , 100 million rows. Can somebody guide why this exception is seen, am I missing some configuration ? Thanks Yash On Mon, Mar 30, 2015 at 3:05 AM, Sean Owen <so...@cloudera.com> wrote: > Given that's it's an internal error from scalac, I think it may be > something to take up with the Scala folks to really fix. We can just > look for workarounds. Try blowing away your .m2 and .ivy cache for > example. FWIW I was running on Linux with Java 8u31, latest scala 2.11 > AFAIK. > > On Sun, Mar 29, 2015 at 10:29 PM, Pala M Muthaia > <mchett...@rocketfuelinc.com> wrote: > > Sean, > > > > I did a mvn clean and then build, it produces the same error. I also did > a > > fresh git clone of spark and invoked the same build command and it > resulted > > in identical error (I also had a colleague do a same thing, lest there > was > > some machine specific issue, and saw the same error). Unless i > misunderstood > > something, it doesn't look like clean build fixes this. > > > > On Fri, Mar 27, 2015 at 10:20 PM, Sean Owen <so...@cloudera.com> wrote: > >> > >> This is not a compile error, but an error from the scalac compiler. > >> That is, the code and build are fine, but scalac is not compiling it. > >> Usually when this happens, a clean build fixes it. > >> > >> On Fri, Mar 27, 2015 at 7:09 PM, Pala M Muthaia > >> <mchett...@rocketfuelinc.com> wrote: > >> > No, i am running from the root directory, parent of core. > >> > > >> > Here is the first set of errors that i see when i compile from source > >> > (sorry > >> > the error message is very long, but adding it in case it helps in > >> > diagnosis). After i manually add javax.servlet dependency for version > >> > 3.0, > >> > these set of errors go away and i get the next set of errors about > >> > missing > >> > classes under eclipse-jetty. > >> > > >> > I am on maven 3.2.5 and java 1.7. > >> > > >> > Error: > >> > > >> > [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ > >> > spark-core_2.10 --- > >> > [WARNING] Zinc server is not available at port 3030 - reverting to > >> > normal > >> > incremental compile > >> > [INFO] Using incremental compilation > >> > [INFO] compiler plugin: > >> > BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null) > >> > [INFO] Compiling 403 Scala sources and 33 Java sources to > >> > /Users/mchettiar/code/spark/core/target/scala-2.10/classes... > >> > [WARNING] Class javax.servlet.ServletException not found - continuing > >> > with a > >> > stub. > >> > [ERROR] > >> > while compiling: > >> > > >> > > /Users/mchettiar/code/spark/core/src/main/scala/org/apache/spark/HttpServer.scala > >> > during phase: typer > >> > library version: version 2.10.4 > >> > compiler version: version 2.10.4 > >> > reconstructed args: -deprecation -feature > >> > - > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > > -- When events unfold with calm and ease When the winds that blow are merely breeze Learn from nature, from birds and bees Live your life in love, and let joy not cease.