mzheng-plaid opened a new issue, #9934:
URL: https://github.com/apache/hudi/issues/9934
**Describe the problem you faced**
We have a MOR table that is ingested to using a Spark Structured Streaming
pipeline.
We are seeing:
```
py4j.protocol.Py4JJavaError: An error occurred while calling o355.save.
: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at
org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:292)
at
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:243)
at
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:126)
at
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:701)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:345)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1$$Lambda$4086/588517446.apply(Unknown
Source)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
at
org.apache.spark.sql.execution.SQLExecution$$$Lambda$2384/2044625832.apply(Unknown
Source)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
at
org.apache.spark.sql.execution.SQLExecution$$$Lambda$2383/299085843.apply(Unknown
Source)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
at
org.apache.spark.sql.execution.SQLExecution$$$Lambda$2373/595359931.apply(Unknown
Source)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:626)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$1362/580263940.apply(Unknown
Source)
```
It seems like this is happening on commit (ie. its writing the data
successfully) and each time it retries it has to rollback (and each rollback is
getting more and more expensive).
**To Reproduce**
Unclear.
**Expected behavior**
We are not sure how to recover from this bad state. Is this loading in the
`deltacommits` from the timeline and trying to create an array thats too large?
Or is this stack trace indicating its a problem with the current batch (we've
tried turning down the batch size with no change)
EMR 6.10.1
* Hudi version : 0.12.2-amzn-0
* Spark version : 3.3.1
* Hive version : 3.1.3
* Hadoop version : 3.3.3
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : Spark on Docker
**Additional context**
The `.hoodie` path is below:
```
PRE .aux/
2023-10-12 15:58:18 0 .aux_$folder$
2023-10-12 15:58:17 0 .schema_$folder$
2023-10-12 15:58:17 0 .temp_$folder$
2023-10-23 11:56:36 13120 20231023185628734.deltacommit
2023-10-23 11:56:33 786 20231023185628734.deltacommit.inflight
2023-10-23 11:56:30 0 20231023185628734.deltacommit.requested
2023-10-23 11:57:01 13120 20231023185655034.deltacommit
2023-10-23 11:56:59 786 20231023185655034.deltacommit.inflight
2023-10-23 11:56:56 0 20231023185655034.deltacommit.requested
2023-10-23 11:57:29 13120 20231023185724079.deltacommit
2023-10-23 11:57:27 786 20231023185724079.deltacommit.inflight
2023-10-23 11:57:25 0 20231023185724079.deltacommit.requested
2023-10-23 11:58:22 13120 20231023185816050.deltacommit
2023-10-23 11:58:20 786 20231023185816050.deltacommit.inflight
2023-10-23 11:58:17 0 20231023185816050.deltacommit.requested
2023-10-23 11:58:47 13120 20231023185842072.deltacommit
2023-10-23 11:58:46 786 20231023185842072.deltacommit.inflight
2023-10-23 11:58:43 0 20231023185842072.deltacommit.requested
2023-10-23 11:59:18 13120 20231023185912986.deltacommit
2023-10-23 11:59:17 786 20231023185912986.deltacommit.inflight
2023-10-23 11:59:14 0 20231023185912986.deltacommit.requested
2023-10-23 11:59:49 13120 20231023185943122.deltacommit
2023-10-23 11:59:47 786 20231023185943122.deltacommit.inflight
2023-10-23 11:59:44 0 20231023185943122.deltacommit.requested
2023-10-23 12:00:28 13122 20231023190022141.deltacommit
2023-10-23 12:00:26 787 20231023190022141.deltacommit.inflight
2023-10-23 12:00:23 0 20231023190022141.deltacommit.requested
2023-10-23 12:00:54 13120 20231023190048634.deltacommit
2023-10-23 12:00:52 786 20231023190048634.deltacommit.inflight
2023-10-23 12:00:49 0 20231023190048634.deltacommit.requested
2023-10-23 12:01:22 13120 20231023190116217.deltacommit
2023-10-23 12:01:20 786 20231023190116217.deltacommit.inflight
2023-10-23 12:01:17 0 20231023190116217.deltacommit.requested
2023-10-23 12:02:03 13120 20231023190156690.deltacommit
2023-10-23 12:02:01 786 20231023190156690.deltacommit.inflight
2023-10-23 12:01:58 0 20231023190156690.deltacommit.requested
2023-10-23 12:02:25 13120 20231023190219364.deltacommit
2023-10-23 12:02:23 786 20231023190219364.deltacommit.inflight
2023-10-23 12:02:20 0 20231023190219364.deltacommit.requested
2023-10-23 12:02:50 13120 20231023190244765.deltacommit
2023-10-23 12:02:48 786 20231023190244765.deltacommit.inflight
2023-10-23 12:02:46 0 20231023190244765.deltacommit.requested
2023-10-24 01:54:35 443229653 20231024055929151.deltacommit
2023-10-24 00:19:04 254996912 20231024055929151.deltacommit.inflight
2023-10-23 22:59:31 0 20231024055929151.deltacommit.requested
2023-10-24 04:53:28 53730189 20231024085459362.commit
2023-10-24 01:55:19 0 20231024085459362.compaction.inflight
2023-10-24 01:55:11 21504538 20231024085459362.compaction.requested
2023-10-24 11:33:40 470361859 20231024121526121.deltacommit
2023-10-24 07:13:11 255377024 20231024121526121.deltacommit.inflight
2023-10-24 05:15:28 0 20231024121526121.deltacommit.requested
2023-10-24 11:35:57 19591464 20231024183342063.clean
2023-10-24 11:34:17 19546348 20231024183342063.clean.inflight
2023-10-24 11:34:16 19546348 20231024183342063.clean.requested
2023-10-24 14:14:31 53889946 20231024183558057.commit
2023-10-24 11:36:47 0 20231024183558057.compaction.inflight
2023-10-24 11:36:33 30504528 20231024183558057.compaction.requested
2023-10-24 19:34:43 505473946 20231024220832622.deltacommit
2023-10-24 18:16:49 255140523 20231024220832622.deltacommit.inflight
2023-10-24 15:08:36 0 20231024220832622.deltacommit.requested
2023-10-24 19:37:57 37591740 20231025023444653.clean
2023-10-24 19:35:37 37568345 20231025023444653.clean.inflight
2023-10-24 19:35:35 37568345 20231025023444653.clean.requested
2023-10-25 01:38:14 537096732 20231025030135667.deltacommit
2023-10-24 21:02:51 254722288 20231025030135667.deltacommit.inflight
2023-10-24 20:01:53 0 20231025030135667.deltacommit.requested
2023-10-25 01:39:22 424395 20231025083816213.clean
2023-10-25 01:39:19 347630 20231025083816213.clean.inflight
2023-10-25 01:39:18 347630 20231025083816213.clean.requested
2023-10-25 03:41:56 584339710 20231025085335035.deltacommit
2023-10-25 02:37:33 255087790 20231025085335035.deltacommit.inflight
2023-10-25 01:53:51 0 20231025085335035.deltacommit.requested
2023-10-25 03:43:09 409631 20231025104158565.clean
2023-10-25 03:43:06 337563 20231025104158565.clean.inflight
2023-10-25 03:43:06 337563 20231025104158565.clean.requested
2023-10-25 05:41:13 614870598 20231025105625017.deltacommit
2023-10-25 04:38:07 254830363 20231025105625017.deltacommit.inflight
2023-10-25 03:56:39 0 20231025105625017.deltacommit.requested
2023-10-25 05:42:36 404161 20231025124115437.clean
2023-10-25 05:42:33 330608 20231025124115437.clean.inflight
2023-10-25 05:42:33 330608 20231025124115437.clean.requested
2023-10-25 07:51:07 647273329 20231025125551873.deltacommit
2023-10-25 06:34:37 255456811 20231025125551873.deltacommit.inflight
2023-10-25 05:56:09 0 20231025125551873.deltacommit.requested
2023-10-25 07:53:02 399953 20231025145110208.clean
2023-10-25 07:53:00 325834 20231025145110208.clean.inflight
2023-10-25 07:52:59 325834 20231025145110208.clean.requested
2023-10-25 19:41:02 42162636 20231025225741702.rollback
2023-10-25 16:13:52 0 20231025225741702.rollback.inflight
2023-10-25 16:13:50 66193583 20231025225741702.rollback.requested
2023-10-26 00:51:56 56178946 20231026050753360.rollback
2023-10-25 22:32:23 0 20231026050753360.rollback.inflight
2023-10-25 22:32:21 87498808 20231026050753360.rollback.requested
2023-10-26 05:26:47 56204991 20231026105209390.rollback
2023-10-26 04:21:34 0 20231026105209390.rollback.inflight
2023-10-26 04:21:31 87609453 20231026105209390.rollback.requested
2023-10-26 08:11:51 56207342 20231026143225288.rollback
2023-10-26 07:43:56 0 20231026143225288.rollback.inflight
2023-10-26 07:43:54 87607436 20231026143225288.rollback.requested
2023-10-26 12:55:52 56706727 20231026182236507.rollback
2023-10-26 12:02:30 0 20231026182236507.rollback.inflight
2023-10-26 12:02:27 88394276 20231026182236507.rollback.requested
2023-10-26 17:56:17 56207673 20231027000817536.rollback
2023-10-26 17:23:08 0 20231027000817536.rollback.inflight
2023-10-26 17:23:06 87626942 20231027000817536.rollback.requested
2023-10-27 00:34:06 0 20231027045612937.deltacommit.requested
2023-10-27 00:34:01 72095106 20231027045638746.rollback
2023-10-26 22:34:27 0 20231027045638746.rollback.inflight
2023-10-26 22:34:24 112381340 20231027045638746.rollback.requested
2023-10-12 15:58:17 0 archived_$folder$
2023-10-12 15:58:18 884 hoodie.properties
```
Hudi properties:
```
#Properties saved on 2023-10-12T22:58:17.872Z
#Thu Oct 12 22:58:17 UTC 2023
hoodie.table.timeline.timezone=LOCAL
hoodie.table.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
hoodie.table.precombine.field=publishedAtUnixNano
hoodie.table.version=5
hoodie.database.name=
hoodie.datasource.write.hive_style_partitioning=true
hoodie.partition.metafile.use.base.format=false
hoodie.archivelog.folder=archived
hoodie.table.name=xxx
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload
hoodie.populate.meta.fields=true
hoodie.table.type=MERGE_ON_READ
hoodie.datasource.write.partitionpath.urlencode=false
hoodie.table.base.file.format=PARQUET
hoodie.datasource.write.drop.partition.columns=false
hoodie.timeline.layout.version=1
hoodie.table.partition.fields=dt
hoodie.table.recordkey.fields=id.value
hoodie.table.checksum=3616660964
```
**Stacktrace**
See above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]