Sen Fang created SPARK-8443:
-------------------------------

             Summary: GenerateMutableProjection Exceeds JVM Code Size Limits
                 Key: SPARK-8443
                 URL: https://issues.apache.org/jira/browse/SPARK-8443
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.4.0
            Reporter: Sen Fang


GenerateMutableProjection put all expressions columns into a single apply 
function. When there are a lot of columns, the apply function code size exceeds 
the 64kb limit, which is a hard limit on jvm that cannot change.

This comes up when we were aggregating about 100 columns using codegen and 
unsafe feature.

I wrote an unit test that reproduces this issue. 
https://github.com/saurfang/spark/blob/codegen_size_limit/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala

This test currently fails at 2048 expressions. It seems the master is more 
tolerant than branch-1.4 about this because code is more concise.

While the code on master has changed since branch-1.4, I am able to reproduce 
the problem in master. For now I hacked my way in branch-1.4 to workaround this 
problem by wrapping each expression with a separate function then call those 
functions sequentially in apply. The proper way is probably check the length of 
the projectCode and break it up as necessary. (This seems to be easier in 
master actually since we are building code by string rather than quasiquote)

Let me know if anyone has additional thoughts on this, I'm happy to contribute 
a pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to