[GitHub] [hudi] luyongbiao commented on issue #8416: [SUPPORT] data loss after createRdd method in HoodieSparkUtils.scala

via GitHub Mon, 10 Apr 2023 12:44:35 -0700


luyongbiao commented on issue #8416:
URL: https://github.com/apache/hudi/issues/8416#issuecomment-1502232116


   @ad1happy2go thanks, your solution fixed my problem. 
   But I used `dataset.withColumn(newColumnName, functions.expr(expression))` 
with long expression before writing, The data is lost again.  
   the write action executed successfully.And Stacktrace report a issue after 
createRdd method in HoodieSparkUtils.scala。 
   ```
   org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": 
Code of method "processNext()V" of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1"
 grows beyond 64 KB
       at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:382) 
~[janino-3.0.9.jar:na]
       at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:237) 
~[janino-3.0.9.jar:na]
       at 
org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:465)
 ~[janino-3.0.9.jar:na]
       at 
org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313)
 ~[janino-3.0.9.jar:na]
       at 
org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235) 
~[janino-3.0.9.jar:na]
       at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:207) 
~[janino-3.0.9.jar:na]
       at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) 
~[commons-compiler-3.0.9.jar:na]
       at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1403)
 [spark-catalyst_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1500)
 [spark-catalyst_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1497)
 [spark-catalyst_2.12-3.1.1.jar:3.1.1]
       at 
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
 [spark-network-common_2.12-3.1.1.jar:3.1.1]
       at 
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) 
[spark-network-common_2.12-3.1.1.jar:3.1.1]
       at 
org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 [spark-network-common_2.12-3.1.1.jar:3.1.1]
       at 
org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
[spark-network-common_2.12-3.1.1.jar:3.1.1]
       at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) 
[spark-network-common_2.12-3.1.1.jar:3.1.1]
       at 
org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) 
[spark-network-common_2.12-3.1.1.jar:3.1.1]
       at 
org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 [spark-network-common_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1351)
 [spark-catalyst_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:721)
 [spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
 [spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
 [spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
 [spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
~[spark-core_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) 
[spark-sql_2.12-3.1.1.jar:3.1.1]
       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) 
[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) 
~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.hudi.HoodieSparkUtils$.createRdd(HoodieSparkUtils.scala:101) 
~[hudi-spark3.1-bundle_2.12-0.12.2.jar:0.12.2]
       at 
org.apache.hudi.HoodieSparkUtils$.createRdd(HoodieSparkUtils.scala:79) 
~[hudi-spark3.1-bundle_2.12-0.12.2.jar:0.12.2]
       at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:292) 
~[hudi-spark3.1-bundle_2.12-0.12.2.jar:0.12.2]
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) 
~[hudi-spark3.1-bundle_2.12-0.12.2.jar:0.12.2]
       at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
 [spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
 [spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
~[spark-core_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) 
[spark-sql_2.12-3.1.1.jar:3.1.1]
       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) 
[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) 
~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) 
~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
 ~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) 
~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) 
~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) 
~[spark-sql_2.12-3.1.1.jar:3.1.1]
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) 
~[spark-sql_2.12-3.1.1.jar:3.1.1]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] luyongbiao commented on issue #8416: [SUPPORT] data loss after createRdd method in HoodieSparkUtils.scala

Reply via email to