[
https://issues.apache.org/jira/browse/SPARK-17936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575945#comment-15575945
]
Justin Miller commented on SPARK-17936:
---------------------------------------
Hey Sean,
I did a bit more digging this morning looking at SpecificUnsafeProjection and
saw this commit:
https://github.com/apache/spark/commit/b1b47274bfeba17a9e4e9acebd7385289f31f6c8
I thought I'd try running w/2.1.0-SNAPSHOT and see how things went and it
appears to work great now!
[Stage 1:> (0 + 8) / 8]11:28:33.237 INFO c.p.o.ObservationPersister -
(ObservationPersister) - Thrift Parse Success: 0 / Thrift Parse Errors: 0
[Stage 3:> (0 + 8) / 8]11:29:03.236 INFO c.p.o.ObservationPersister -
(ObservationPersister) - Thrift Parse Success: 89 / Thrift Parse Errors: 0
[Stage 5:> (4 + 4) / 8]11:29:33.237 INFO c.p.o.ObservationPersister -
(ObservationPersister) - Thrift Parse Success: 205 / Thrift Parse Errors: 0
Since we're still testing this out that snapshot works great for now. Do you
know when 2.1.0 might be available generally?
Best,
Justin
> "CodeGenerator - failed to compile:
> org.codehaus.janino.JaninoRuntimeException: Code of" method Error
> -----------------------------------------------------------------------------------------------------
>
> Key: SPARK-17936
> URL: https://issues.apache.org/jira/browse/SPARK-17936
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.0.1
> Reporter: Justin Miller
>
> Greetings. I'm currently in the process of migrating a project I'm working on
> from Spark 1.6.2 to 2.0.1. The project uses Spark Streaming to convert Thrift
> structs coming from Kafka into Parquet files stored in S3. This conversion
> process works fine in 1.6.2 but I think there may be a bug in 2.0.1. I'll
> paste the stack trace below.
> org.codehaus.janino.JaninoRuntimeException: Code of method
> "(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass;[Ljava/lang/Object;)V"
> of class
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
> grows beyond 64 KB
> at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
> at org.codehaus.janino.CodeContext.write(CodeContext.java:854)
> at org.codehaus.janino.UnitCompiler.writeShort(UnitCompiler.java:10242)
> at org.codehaus.janino.UnitCompiler.writeLdc(UnitCompiler.java:9058)
> Also, later on:
> 07:35:30.191 ERROR o.a.s.u.SparkUncaughtExceptionHandler - Uncaught exception
> in thread Thread[Executor task launch worker-6,5,run-main-group-0]
> java.lang.OutOfMemoryError: Java heap space
> I've seen similar issues posted, but those were always on the query side. I
> have a hunch that this is happening at write time as the error occurs after
> batchDuration. Here's the write snippet.
> stream.
> flatMap {
> case Success(row) =>
> thriftParseSuccess += 1
> Some(row)
> case Failure(ex) =>
> thriftParseErrors += 1
> logger.error("Error during deserialization: ", ex)
> None
> }.foreachRDD { rdd =>
> val sqlContext = SQLContext.getOrCreate(rdd.context)
> transformer(sqlContext.createDataFrame(rdd, converter.schema))
> .coalesce(coalesceSize)
> .write
> .mode(Append)
> .partitionBy(partitioning: _*)
> .parquet(parquetPath)
> }
> Please let me know if you can be of assistance and if there's anything I can
> do to help.
> Best,
> Justin
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]