[
https://issues.apache.org/jira/browse/SPARK-18091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722292#comment-15722292
]
Apache Spark commented on SPARK-18091:
--------------------------------------
User 'kapilsingh5050' has created a pull request for this issue:
https://github.com/apache/spark/pull/16146
> Deep if expressions cause Generated SpecificUnsafeProjection code to exceed
> JVM code size limit
> -----------------------------------------------------------------------------------------------
>
> Key: SPARK-18091
> URL: https://issues.apache.org/jira/browse/SPARK-18091
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.1
> Reporter: Kapil Singh
> Assignee: Kapil Singh
> Priority: Critical
> Fix For: 2.0.3, 2.1.0
>
>
> *Problem Description:*
> I have an application in which a lot of if-else decisioning is involved to
> generate output. I'm getting following exception:
> Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method
> "(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
> of class
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
> grows beyond 64 KB
> at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
> at org.codehaus.janino.CodeContext.write(CodeContext.java:874)
> at org.codehaus.janino.CodeContext.writeBranch(CodeContext.java:965)
> at org.codehaus.janino.UnitCompiler.writeBranch(UnitCompiler.java:10261)
> *Steps to Reproduce:*
> I've come up with a unit test which I was able to run in
> CodeGenerationSuite.scala:
> {code}
> test("split large if expressions into blocks due to JVM code size limit") {
> val row =
> create_row("afafFAFFsqcategory2dadDADcategory8sasasadscategory24", 0)
> val inputStr = 'a.string.at(0)
> val inputIdx = 'a.int.at(1)
> val length = 10
> val valuesToCompareTo = for (i <- 1 to (length + 1)) yield ("category" +
> i)
> val initCondition = EqualTo(RegExpExtract(inputStr, Literal("category1"),
> inputIdx), valuesToCompareTo(0))
> var res: Expression = If(initCondition, Literal("category1"),
> Literal("NULL"))
> var cummulativeCondition: Expression = Not(initCondition)
> for (index <- 1 to length) {
> val valueExtractedFromInput = RegExpExtract(inputStr,
> Literal("category" + (index + 1).toString), inputIdx)
> val currComparee = If(cummulativeCondition, valueExtractedFromInput,
> Literal("NULL"))
> val currCondition = EqualTo(currComparee, valuesToCompareTo(index))
> val combinedCond = And(cummulativeCondition, currCondition)
> res = If(combinedCond, If(combinedCond, valueExtractedFromInput,
> Literal("NULL")), res)
> cummulativeCondition = And(Not(currCondition), cummulativeCondition)
> }
> val expressions = Seq(res)
> val plan = GenerateUnsafeProjection.generate(expressions, true)
> val actual = plan(row).toSeq(expressions.map(_.dataType))
> val expected = Seq(UTF8String.fromString("category2"))
> if (!checkResult(actual, expected)) {
> fail(s"Incorrect Evaluation: expressions: $expressions, actual:
> $actual, expected: $expected")
> }
> }
> {code}
> *Root Cause:*
> Current splitting of Projection codes doesn't (and can't) take care of
> splitting the generated code for individual output column expressions. So it
> can grow to exceed JVM limit.
> *Note:* This issue seems related to SPARK-14887 but I'm not sure whether the
> root cause is same
>
> *Proposed Fix:*
> If expression should place it's predicate, true value and false value
> expressions' generated code in separate methods in context and call these
> methods instead of putting the whole code directly in its generated code
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]