[
https://issues.apache.org/jira/browse/SPARK-13431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159649#comment-15159649
]
Kazuaki Ishizaki commented on SPARK-13431:
------------------------------------------
I identified why this problem occurs only in maven. Shade plugin for maven
increases the length of Java bytecode for a method. This increasing happens
since shade plugin rewrites Java bytecode to rebuild constant pool.
Here is an output of ``javap -c SparkSqlParser_ExpressionParser.class`` before
applying shade plugin. The static initializer ``static{}`` uses ``ldc``
bytecode for accessing constant pool at offset 13, 18, and 23. Each ``ldc``
consume only two bytes. As a result, the bytecode length of this method is
*less than 65536*.
{code}
public class
org.apache.spark.sql.catalyst.parser.SparkSqlParser_ExpressionParser extends
org.antlr.runtime.Parser {
...
static {};
Code:
0: bipush 70
2: anewarray #1035 // class java/lang/String
5: dup
6: iconst_0
7: ldc_w #1036 // String ...
10: aastore
11: dup
12: iconst_1
13: ldc #127 // String
15: aastore
16: dup
17: iconst_2
18: ldc #127 // String
20: aastore
21: dup
22: iconst_3
23: ldc #127 // String
25: aastore
...
59900: return
}
}
{code}
After applying shade plugin, the static initializer ``static{}`` uses ``ldc_w``
bytecode for accessing constant pool at offset 13, 19, and 25. Each ``ldc_w``
consumes three bytes. As a result, the bytecode length of this method is *more
than 65535*.
{code}
static {};
Code:
0: bipush 70
2: anewarray #2965 // class java/lang/String
5: dup
6: iconst_0
7: ldc_w #5240 // String ...
10: aastore
11: dup
12: iconst_1
13: ldc_w #2924 // String
16: aastore
17: dup
18: iconst_2
19: ldc_w #2924 // String
22: aastore
23: dup
24: iconst_3
25: ldc_w #2924 // String
28: aastore
...
65533: lconst_0
65534: lastore
...
}
}
{code}
Shading plugin seems to rebuild constant pool based on [this
comment|http://svn.apache.org/viewvc/maven/plugins/tags/maven-shade-plugin-2.4.3/src/main/java/org/apache/maven/plugins/shade/DefaultShader.java?view=markup#l417].
To use a lot of constant pool entry due to many definitions of String may
increase the entry index of the constant pool. As a result, it leads to replace
``ldc`` with ``ldc_w``. Finally, the length of Java bytecode is increased.
As a next step, what will we do?
* Can we avoid this rebuild by an option?
* Can we create a pull request for shade plugin to avoid this?
* Can we use another plugin?
* Can we split ExpressionParser.g into smaller files?
* Other solutions?
> Maven build fails due to: Method code too large! in Catalyst
> ------------------------------------------------------------
>
> Key: SPARK-13431
> URL: https://issues.apache.org/jira/browse/SPARK-13431
> Project: Spark
> Issue Type: Bug
> Components: Build
> Affects Versions: 2.0.0
> Reporter: Stavros Kontopoulos
> Priority: Blocker
>
> Cannot build the project when run the normal build commands:
> eg.
> {code}
> build/mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 clean package
> ./make-distribution.sh --name test --tgz -Phadoop-2.6
> {code}
> Integration builds are also failing:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/229/console
> https://ci.typesafe.com/job/mit-docker-test-zk-ref/12/console
> It looks like this is the commit that introduced the issue:
> https://github.com/apache/spark/commit/7925071280bfa1570435bde3e93492eaf2167d56
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]