Hi all,

When selecting large data in sparksql (Select * query) , I see Buffer
overflow exception from kryo :


15/03/27 10:32:19 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 3.0
(TID 30, machine159): com.esotericsoftware.kryo.KryoException: Buffer
overflow. Available: 1, required: 2
Serialization trace:
values (org.apache.spark.sql.catalyst.expressions.GenericRow)
        at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
        at com.esotericsoftware.kryo.io.Output.writeInt(Output.java:247)
        at
com.esotericsoftware.kryo.serializers.DefaultSerializers$IntSerializer.write(DefaultSerializers.java:95)
        at
com.esotericsoftware.kryo.serializers.DefaultSerializers$IntSerializer.write(DefaultSerializers.java:89)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
        at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
        at
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
        at
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
        at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at
org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:167)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:210)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.lang.Thread.run(Unknown Source)



I thought maybe increasing these would resolve the problem, but the same
exception is seen :

set spark.kryoserializer.buffer.mb=4;
set spark.kryoserializer.buffer.max.mb=1024;


I have a parquet table with 5 Int columns , 100 million rows.

Can somebody guide why this exception is seen, am I missing some
configuration ?

Thanks
Yash


On Mon, Mar 30, 2015 at 3:05 AM, Sean Owen <so...@cloudera.com> wrote:

> Given that's it's an internal error from scalac, I think it may be
> something to take up with the Scala folks to really fix. We can just
> look for workarounds. Try blowing away your .m2 and .ivy cache for
> example. FWIW I was running on Linux with Java 8u31, latest scala 2.11
> AFAIK.
>
> On Sun, Mar 29, 2015 at 10:29 PM, Pala M Muthaia
> <mchett...@rocketfuelinc.com> wrote:
> > Sean,
> >
> > I did a mvn clean and then build, it produces the same error. I also did
> a
> > fresh git clone of spark and invoked the same build command and it
> resulted
> > in identical error (I also had a colleague do a same thing, lest there
> was
> > some machine specific issue, and saw the same error). Unless i
> misunderstood
> > something, it doesn't look like clean build fixes this.
> >
> > On Fri, Mar 27, 2015 at 10:20 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> This is not a compile error, but an error from the scalac compiler.
> >> That is, the code and build are fine, but scalac is not compiling it.
> >> Usually when this happens, a clean build fixes it.
> >>
> >> On Fri, Mar 27, 2015 at 7:09 PM, Pala M Muthaia
> >> <mchett...@rocketfuelinc.com> wrote:
> >> > No, i am running from the root directory, parent of core.
> >> >
> >> > Here is the first set of errors that i see when i compile from source
> >> > (sorry
> >> > the error message is very long, but adding it in case it helps in
> >> > diagnosis). After i manually add javax.servlet dependency for  version
> >> > 3.0,
> >> > these set of errors go away and i get the next set of errors about
> >> > missing
> >> > classes under eclipse-jetty.
> >> >
> >> > I am on maven 3.2.5 and java 1.7.
> >> >
> >> > Error:
> >> >
> >> > [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @
> >> > spark-core_2.10 ---
> >> > [WARNING] Zinc server is not available at port 3030 - reverting to
> >> > normal
> >> > incremental compile
> >> > [INFO] Using incremental compilation
> >> > [INFO] compiler plugin:
> >> > BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
> >> > [INFO] Compiling 403 Scala sources and 33 Java sources to
> >> > /Users/mchettiar/code/spark/core/target/scala-2.10/classes...
> >> > [WARNING] Class javax.servlet.ServletException not found - continuing
> >> > with a
> >> > stub.
> >> > [ERROR]
> >> >      while compiling:
> >> >
> >> >
> /Users/mchettiar/code/spark/core/src/main/scala/org/apache/spark/HttpServer.scala
> >> >         during phase: typer
> >> >      library version: version 2.10.4
> >> >     compiler version: version 2.10.4
> >> >   reconstructed args: -deprecation -feature
> >> > -
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 
When events unfold with calm and ease
When the winds that blow are merely breeze
Learn from nature, from birds and bees
Live your life in love, and let joy not cease.

Reply via email to