Sen Fang created SPARK-8443:
-------------------------------
Summary: GenerateMutableProjection Exceeds JVM Code Size Limits
Key: SPARK-8443
URL: https://issues.apache.org/jira/browse/SPARK-8443
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.4.0
Reporter: Sen Fang
GenerateMutableProjection put all expressions columns into a single apply
function. When there are a lot of columns, the apply function code size exceeds
the 64kb limit, which is a hard limit on jvm that cannot change.
This comes up when we were aggregating about 100 columns using codegen and
unsafe feature.
I wrote an unit test that reproduces this issue.
https://github.com/saurfang/spark/blob/codegen_size_limit/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
This test currently fails at 2048 expressions. It seems the master is more
tolerant than branch-1.4 about this because code is more concise.
While the code on master has changed since branch-1.4, I am able to reproduce
the problem in master. For now I hacked my way in branch-1.4 to workaround this
problem by wrapping each expression with a separate function then call those
functions sequentially in apply. The proper way is probably check the length of
the projectCode and break it up as necessary. (This seems to be easier in
master actually since we are building code by string rather than quasiquote)
Let me know if anyone has additional thoughts on this, I'm happy to contribute
a pull request.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]