mzheng-plaid opened a new issue, #9934:
URL: https://github.com/apache/hudi/issues/9934

   **Describe the problem you faced**
   We have a MOR table that is ingested to using a Spark Structured Streaming 
pipeline. 
   
   We are seeing:
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o355.save.
   : java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.lang.StringCoding.encode(StringCoding.java:350)
        at java.lang.String.getBytes(String.java:941)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:292)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:243)
        at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:126)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:701)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:345)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1$$Lambda$4086/588517446.apply(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
        at 
org.apache.spark.sql.execution.SQLExecution$$$Lambda$2384/2044625832.apply(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
        at 
org.apache.spark.sql.execution.SQLExecution$$$Lambda$2383/299085843.apply(Unknown
 Source)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
        at 
org.apache.spark.sql.execution.SQLExecution$$$Lambda$2373/595359931.apply(Unknown
 Source)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:626)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$1362/580263940.apply(Unknown
 Source)
   ```
   
   It seems like this is happening on commit (ie. its writing the data 
successfully) and each time it retries it has to rollback (and each rollback is 
getting more and more expensive).
   
   **To Reproduce**
   
   Unclear.
   
   **Expected behavior**
   
   We are not sure how to recover from this bad state. Is this loading in the 
`deltacommits` from the timeline and trying to create an array thats too large? 
Or is this stack trace indicating its a problem with the current batch (we've 
tried turning down the batch size with no change)
   
   EMR 6.10.1
   
   * Hudi version : 0.12.2-amzn-0
   
   * Spark version : 3.3.1
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : Spark on Docker
   
   
   **Additional context**
   
   The `.hoodie` path is below:
   ```
                              PRE .aux/
   2023-10-12 15:58:18          0 .aux_$folder$
   2023-10-12 15:58:17          0 .schema_$folder$
   2023-10-12 15:58:17          0 .temp_$folder$
   2023-10-23 11:56:36      13120 20231023185628734.deltacommit
   2023-10-23 11:56:33        786 20231023185628734.deltacommit.inflight
   2023-10-23 11:56:30          0 20231023185628734.deltacommit.requested
   2023-10-23 11:57:01      13120 20231023185655034.deltacommit
   2023-10-23 11:56:59        786 20231023185655034.deltacommit.inflight
   2023-10-23 11:56:56          0 20231023185655034.deltacommit.requested
   2023-10-23 11:57:29      13120 20231023185724079.deltacommit
   2023-10-23 11:57:27        786 20231023185724079.deltacommit.inflight
   2023-10-23 11:57:25          0 20231023185724079.deltacommit.requested
   2023-10-23 11:58:22      13120 20231023185816050.deltacommit
   2023-10-23 11:58:20        786 20231023185816050.deltacommit.inflight
   2023-10-23 11:58:17          0 20231023185816050.deltacommit.requested
   2023-10-23 11:58:47      13120 20231023185842072.deltacommit
   2023-10-23 11:58:46        786 20231023185842072.deltacommit.inflight
   2023-10-23 11:58:43          0 20231023185842072.deltacommit.requested
   2023-10-23 11:59:18      13120 20231023185912986.deltacommit
   2023-10-23 11:59:17        786 20231023185912986.deltacommit.inflight
   2023-10-23 11:59:14          0 20231023185912986.deltacommit.requested
   2023-10-23 11:59:49      13120 20231023185943122.deltacommit
   2023-10-23 11:59:47        786 20231023185943122.deltacommit.inflight
   2023-10-23 11:59:44          0 20231023185943122.deltacommit.requested
   2023-10-23 12:00:28      13122 20231023190022141.deltacommit
   2023-10-23 12:00:26        787 20231023190022141.deltacommit.inflight
   2023-10-23 12:00:23          0 20231023190022141.deltacommit.requested
   2023-10-23 12:00:54      13120 20231023190048634.deltacommit
   2023-10-23 12:00:52        786 20231023190048634.deltacommit.inflight
   2023-10-23 12:00:49          0 20231023190048634.deltacommit.requested
   2023-10-23 12:01:22      13120 20231023190116217.deltacommit
   2023-10-23 12:01:20        786 20231023190116217.deltacommit.inflight
   2023-10-23 12:01:17          0 20231023190116217.deltacommit.requested
   2023-10-23 12:02:03      13120 20231023190156690.deltacommit
   2023-10-23 12:02:01        786 20231023190156690.deltacommit.inflight
   2023-10-23 12:01:58          0 20231023190156690.deltacommit.requested
   2023-10-23 12:02:25      13120 20231023190219364.deltacommit
   2023-10-23 12:02:23        786 20231023190219364.deltacommit.inflight
   2023-10-23 12:02:20          0 20231023190219364.deltacommit.requested
   2023-10-23 12:02:50      13120 20231023190244765.deltacommit
   2023-10-23 12:02:48        786 20231023190244765.deltacommit.inflight
   2023-10-23 12:02:46          0 20231023190244765.deltacommit.requested
   2023-10-24 01:54:35  443229653 20231024055929151.deltacommit
   2023-10-24 00:19:04  254996912 20231024055929151.deltacommit.inflight
   2023-10-23 22:59:31          0 20231024055929151.deltacommit.requested
   2023-10-24 04:53:28   53730189 20231024085459362.commit
   2023-10-24 01:55:19          0 20231024085459362.compaction.inflight
   2023-10-24 01:55:11   21504538 20231024085459362.compaction.requested
   2023-10-24 11:33:40  470361859 20231024121526121.deltacommit
   2023-10-24 07:13:11  255377024 20231024121526121.deltacommit.inflight
   2023-10-24 05:15:28          0 20231024121526121.deltacommit.requested
   2023-10-24 11:35:57   19591464 20231024183342063.clean
   2023-10-24 11:34:17   19546348 20231024183342063.clean.inflight
   2023-10-24 11:34:16   19546348 20231024183342063.clean.requested
   2023-10-24 14:14:31   53889946 20231024183558057.commit
   2023-10-24 11:36:47          0 20231024183558057.compaction.inflight
   2023-10-24 11:36:33   30504528 20231024183558057.compaction.requested
   2023-10-24 19:34:43  505473946 20231024220832622.deltacommit
   2023-10-24 18:16:49  255140523 20231024220832622.deltacommit.inflight
   2023-10-24 15:08:36          0 20231024220832622.deltacommit.requested
   2023-10-24 19:37:57   37591740 20231025023444653.clean
   2023-10-24 19:35:37   37568345 20231025023444653.clean.inflight
   2023-10-24 19:35:35   37568345 20231025023444653.clean.requested
   2023-10-25 01:38:14  537096732 20231025030135667.deltacommit
   2023-10-24 21:02:51  254722288 20231025030135667.deltacommit.inflight
   2023-10-24 20:01:53          0 20231025030135667.deltacommit.requested
   2023-10-25 01:39:22     424395 20231025083816213.clean
   2023-10-25 01:39:19     347630 20231025083816213.clean.inflight
   2023-10-25 01:39:18     347630 20231025083816213.clean.requested
   2023-10-25 03:41:56  584339710 20231025085335035.deltacommit
   2023-10-25 02:37:33  255087790 20231025085335035.deltacommit.inflight
   2023-10-25 01:53:51          0 20231025085335035.deltacommit.requested
   2023-10-25 03:43:09     409631 20231025104158565.clean
   2023-10-25 03:43:06     337563 20231025104158565.clean.inflight
   2023-10-25 03:43:06     337563 20231025104158565.clean.requested
   2023-10-25 05:41:13  614870598 20231025105625017.deltacommit
   2023-10-25 04:38:07  254830363 20231025105625017.deltacommit.inflight
   2023-10-25 03:56:39          0 20231025105625017.deltacommit.requested
   2023-10-25 05:42:36     404161 20231025124115437.clean
   2023-10-25 05:42:33     330608 20231025124115437.clean.inflight
   2023-10-25 05:42:33     330608 20231025124115437.clean.requested
   2023-10-25 07:51:07  647273329 20231025125551873.deltacommit
   2023-10-25 06:34:37  255456811 20231025125551873.deltacommit.inflight
   2023-10-25 05:56:09          0 20231025125551873.deltacommit.requested
   2023-10-25 07:53:02     399953 20231025145110208.clean
   2023-10-25 07:53:00     325834 20231025145110208.clean.inflight
   2023-10-25 07:52:59     325834 20231025145110208.clean.requested
   2023-10-25 19:41:02   42162636 20231025225741702.rollback
   2023-10-25 16:13:52          0 20231025225741702.rollback.inflight
   2023-10-25 16:13:50   66193583 20231025225741702.rollback.requested
   2023-10-26 00:51:56   56178946 20231026050753360.rollback
   2023-10-25 22:32:23          0 20231026050753360.rollback.inflight
   2023-10-25 22:32:21   87498808 20231026050753360.rollback.requested
   2023-10-26 05:26:47   56204991 20231026105209390.rollback
   2023-10-26 04:21:34          0 20231026105209390.rollback.inflight
   2023-10-26 04:21:31   87609453 20231026105209390.rollback.requested
   2023-10-26 08:11:51   56207342 20231026143225288.rollback
   2023-10-26 07:43:56          0 20231026143225288.rollback.inflight
   2023-10-26 07:43:54   87607436 20231026143225288.rollback.requested
   2023-10-26 12:55:52   56706727 20231026182236507.rollback
   2023-10-26 12:02:30          0 20231026182236507.rollback.inflight
   2023-10-26 12:02:27   88394276 20231026182236507.rollback.requested
   2023-10-26 17:56:17   56207673 20231027000817536.rollback
   2023-10-26 17:23:08          0 20231027000817536.rollback.inflight
   2023-10-26 17:23:06   87626942 20231027000817536.rollback.requested
   2023-10-27 00:34:06          0 20231027045612937.deltacommit.requested
   2023-10-27 00:34:01   72095106 20231027045638746.rollback
   2023-10-26 22:34:27          0 20231027045638746.rollback.inflight
   2023-10-26 22:34:24  112381340 20231027045638746.rollback.requested
   2023-10-12 15:58:17          0 archived_$folder$
   2023-10-12 15:58:18        884 hoodie.properties
   ```
   
   Hudi properties:
   ```
   #Properties saved on 2023-10-12T22:58:17.872Z
   #Thu Oct 12 22:58:17 UTC 2023
   hoodie.table.timeline.timezone=LOCAL
   hoodie.table.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
   hoodie.table.precombine.field=publishedAtUnixNano
   hoodie.table.version=5
   hoodie.database.name=
   hoodie.datasource.write.hive_style_partitioning=true
   hoodie.partition.metafile.use.base.format=false
   hoodie.archivelog.folder=archived
   hoodie.table.name=xxx
   
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload
   hoodie.populate.meta.fields=true
   hoodie.table.type=MERGE_ON_READ
   hoodie.datasource.write.partitionpath.urlencode=false
   hoodie.table.base.file.format=PARQUET
   hoodie.datasource.write.drop.partition.columns=false
   hoodie.timeline.layout.version=1
   hoodie.table.partition.fields=dt
   hoodie.table.recordkey.fields=id.value
   hoodie.table.checksum=3616660964
   ```
   
   **Stacktrace**
   
   See above.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to