Aleksander Eskilson created SPARK-18016:
-------------------------------------------

             Summary: Code Generation Fails When Encoding Large Object to Wide 
Dataset
                 Key: SPARK-18016
                 URL: https://issues.apache.org/jira/browse/SPARK-18016
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Aleksander Eskilson


When attempting to encode collections of large Java objects to Datasets having 
very wide or deeply nested schemas, code generation can fail, yielding:

{code}
Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
 has grown past JVM limit of 0xFFFF
        at 
org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499)
        at 
org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439)
        at 
org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358)
        at 
org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:11114)
        at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547)
        at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206)
        at 
org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774)
        at 
org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762)
        at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
        at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762)
        at 
org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933)
        at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180)
        at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206)
        at 
org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151)
        at 
org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139)
        at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
        at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139)
        at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112)
        at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206)
        at 
org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377)
        at 
org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370)
        at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558)
        at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370)
        at 
org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450)
        at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811)
        at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
        at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
        at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
        at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
        at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
        at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
        at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
        at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
        at 
org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
        at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
        at 
org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
        at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
        at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420)
        at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206)
        at 
org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374)
        at 
org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:369)
        at 
org.codehaus.janino.Java$AbstractPackageMemberClassDeclaration.accept(Java.java:1309)
        at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
        at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:345)
        at 
org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:396)
        at 
org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:311)
        at 
org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:229)
        at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:196)
        at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:91)
        at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:905)
        ... 35 more
{code}

During generation of the code for SpecificUnsafeProjection, all the mutable 
variables are declared up front. If there are too many, it seems it perhaps 
exceeds some type of resource limit.

This issue seems related to (but is not fixed by) SPARK-17702, which itself was 
about the size of individual methods growing beyond the 64 KB limit. 
SPARK-17702 was resolved by breaking extractions into smaller methods [1], but 
this issue looks to be about the sheer number of up-front declared variables 
[2]. 

I've created a small project [3] where I declare a list of "wide" and "nested" 
Bean objects that I attempt to encode to a Dataset. This code can trigger the 
failure for Spark 2.1.0-SNAPSHOT. And I'll additionally attach the error log 
that shows the code produced and the stacktrace.

[1] - 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala#L383
[2] - 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala#L376
[3] - https://github.com/bdrillard/spark-codegen-error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to