[
https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552242#comment-16552242
]
Tang Yu Jie commented on SPARK-18492:
-------------------------------------
Here I also encountered this problem, have you resolved it?
> GeneratedIterator grows beyond 64 KB
> ------------------------------------
>
> Key: SPARK-18492
> URL: https://issues.apache.org/jira/browse/SPARK-18492
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.1
> Environment: CentOS release 6.7 (Final)
> Reporter: Norris Merritt
> Priority: Major
> Attachments: Screenshot from 2018-03-02 12-57-51.png
>
>
> spark-submit fails with ERROR CodeGenerator: failed to compile:
> org.codehaus.janino.JaninoRuntimeException: Code of method
> "(I[Lscala/collection/Iterator;)V" of class
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator"
> grows beyond 64 KB
> Error message is followed by a huge dump of generated source code.
> The generated code declares 1,454 field sequences like the following:
> /* 036 */ private org.apache.spark.sql.catalyst.expressions.ScalaUDF
> project_scalaUDF1;
> /* 037 */ private scala.Function1 project_catalystConverter1;
> /* 038 */ private scala.Function1 project_converter1;
> /* 039 */ private scala.Function1 project_converter2;
> /* 040 */ private scala.Function2 project_udf1;
> .... (many omitted lines) ...
> /* 6089 */ private org.apache.spark.sql.catalyst.expressions.ScalaUDF
> project_scalaUDF1454;
> /* 6090 */ private scala.Function1 project_catalystConverter1454;
> /* 6091 */ private scala.Function1 project_converter1695;
> /* 6092 */ private scala.Function1 project_udf1454;
> It then proceeds to emit code for several methods (init, processNext) each of
> which has totally repetitive sequences of statements pertaining to each of
> the sequences of variables declared in the class. For example:
> /* 6101 */ public void init(int index, scala.collection.Iterator inputs[]) {
> The reason that the 64KB JVM limit for code for a method is exceeded is
> because the code generator is using an incredibly naive strategy. It emits a
> sequence like the one shown below for each of the 1,454 groups of variables
> shown above, in
> /* 6132 */ this.project_udf =
> (scala.Function1)project_scalaUDF.userDefinedFunc();
> /* 6133 */ this.project_scalaUDF1 =
> (org.apache.spark.sql.catalyst.expressions.ScalaUDF) references[10];
> /* 6134 */ this.project_catalystConverter1 =
> (scala.Function1)org.apache.spark.sql.catalyst.CatalystTypeConverters$.MODULE$.createToCatalystConverter(project_scalaUDF1.dataType());
> /* 6135 */ this.project_converter1 =
> (scala.Function1)org.apache.spark.sql.catalyst.CatalystTypeConverters$.MODULE$.createToScalaConverter(((org.apache.spark.sql.catalyst.expressions.Expression)(((org.apache.spark.sql.catalyst.expressions.ScalaUDF)references[10]).getChildren().apply(0))).dataType());
> /* 6136 */ this.project_converter2 =
> (scala.Function1)org.apache.spark.sql.catalyst.CatalystTypeConverters$.MODULE$.createToScalaConverter(((org.apache.spark.sql.catalyst.expressions.Expression)(((org.apache.spark.sql.catalyst.expressions.ScalaUDF)references[10]).getChildren().apply(1))).dataType());
> It blows up after emitting 230 such sequences, while trying to emit the 231st:
> /* 7282 */ this.project_udf230 =
> (scala.Function2)project_scalaUDF230.userDefinedFunc();
> /* 7283 */ this.project_scalaUDF231 =
> (org.apache.spark.sql.catalyst.expressions.ScalaUDF) references[240];
> /* 7284 */ this.project_catalystConverter231 =
> (scala.Function1)org.apache.spark.sql.catalyst.CatalystTypeConverters$.MODULE$.createToCatalystConverter(project_scalaUDF231.dataType());
> .... many omitted lines ...
> Example of repetitive code sequences emitted for processNext method:
> /* 12253 */ boolean project_isNull247 = project_result244 == null;
> /* 12254 */ MapData project_value247 = null;
> /* 12255 */ if (!project_isNull247) {
> /* 12256 */ project_value247 = project_result244;
> /* 12257 */ }
> /* 12258 */ Object project_arg = sort_isNull5 ? null :
> project_converter489.apply(sort_value5);
> /* 12259 */
> /* 12260 */ ArrayData project_result249 = null;
> /* 12261 */ try {
> /* 12262 */ project_result249 =
> (ArrayData)project_catalystConverter248.apply(project_udf248.apply(project_arg));
> /* 12263 */ } catch (Exception e) {
> /* 12264 */ throw new
> org.apache.spark.SparkException(project_scalaUDF248.udfErrorMessage(), e);
> /* 12265 */ }
> /* 12266 */
> /* 12267 */ boolean project_isNull252 = project_result249 == null;
> /* 12268 */ ArrayData project_value252 = null;
> /* 12269 */ if (!project_isNull252) {
> /* 12270 */ project_value252 = project_result249;
> /* 12271 */ }
> /* 12272 */ Object project_arg1 = project_isNull252 ? null :
> project_converter488.apply(project_value252);
> /* 12273 */
> /* 12274 */ ArrayData project_result248 = null;
> /* 12275 */ try {
> /* 12276 */ project_result248 =
> (ArrayData)project_catalystConverter247.apply(project_udf247.apply(project_arg1));
> /* 12277 */ } catch (Exception e) {
> /* 12278 */ throw new
> org.apache.spark.SparkException(project_scalaUDF247.udfErrorMessage(), e);
> /* 12279 */ }
> /* 12280 */
> /* 12281 */ boolean project_isNull251 = project_result248 == null;
> /* 12282 */ ArrayData project_value251 = null;
> /* 12283 */ if (!project_isNull251) {
> /* 12284 */ project_value251 = project_result248;
> /* 12285 */ }
> /* 12286 */ Object project_arg2 = project_isNull251 ? null :
> project_converter487.apply(project_value251);
> /* 12287 */
> /* 12288 */ InternalRow project_result247 = null;
> /* 12289 */ try {
> /* 12290 */ project_result247 =
> (InternalRow)project_catalystConverter246.apply(project_udf246.apply(project_arg2));
> /* 12291 */ } catch (Exception e) {
> /* 12292 */ throw new
> org.apache.spark.SparkException(project_scalaUDF246.udfErrorMessage(), e);
> /* 12293 */ }
> /* 12294 */
> /* 12295 */ boolean project_isNull250 = project_result247 == null;
> /* 12296 */ InternalRow project_value250 = null;
> /* 12297 */ if (!project_isNull250) {
> /* 12298 */ project_value250 = project_result247;
> /* 12299 */ }
> /* 12300 */ Object project_arg3 = project_isNull250 ? null :
> project_converter486.apply(project_value250);
> /* 12301 */
> /* 12302 */ InternalRow project_result246 = null;
> /* 12303 */ try {
> /* 12304 */ project_result246 =
> (InternalRow)project_catalystConverter245.apply(project_udf245.apply(project_arg3));
> /* 12305 */ } catch (Exception e) {
> /* 12306 */ throw new
> org.apache.spark.SparkException(project_scalaUDF245.udfErrorMessage(), e);
> /* 12307 */ }
> /* 12308 */
> It is pretty clear that the code generation strategy is naive. The code
> generator should use arrays and loops instead of emitting all these
> repetitive code sequences which only differ by a few numerical digits used to
> generate the name of the variables.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]