kasured opened a new issue, #7246: URL: https://github.com/apache/hudi/issues/7246
**Describe the problem you faced** We started seeing OutOfMemory issues while finalizing writes to the Hudi COW table on the step of archival Documentation is not quite clear how to tune the retention period of the archival process and hudi-cli does not have options to clear these archived commits. What are the options to unblock that process. We have a lot of memory allocated so we would not like to go that way of increasing that even more .hoodie/archived folder is already 200 files and have 3GB of space allocated **Environment Description** * Hudi version : 0.11.0 * Spark version : 0.11.0-amzn-1 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ```es]]\n\n\tat org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:325)\n\tat org.apache.spark.sq l.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:209)\nCaused by: java.lang.OutOfMemoryError\n\tat java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.jav a:123)\n\tat java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)\n\tat java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)\n\tat java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)\n\tat java.io.DataOutputStream.write(DataOutputStream.java:107)\n\tat java.io.FilterOutputStream.write(FilterOutputStream.java:97)\n\tat org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.serializeRecords(HoodieAvroDataBlock.java:127)\n\tat org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)\n\tat org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)\n\tat org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlock(HoodieLogFormatWriter.java:135)\n\tat org.apache.hudi.client.HoodieTimelineArchiver.writeToFile(HoodieTimelineArchiver.java:628)\n\tat org.apache.hudi.client.HoodieTimelineArchiver.archive(HoodieTimelineArchiver.java:600)\n\tat org.ap ache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:169)\n\tat org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:907)\n\tat org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:629)\n\tat org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:534)\n\tat org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:236)\n\tat org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:122)\n\tat org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:678)\n\tat org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:313)\n\tat org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:165)\n\tat org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)\n\tat org.apache.spark.sql.execution.command.ExecutedComm andExec.sideEffectResult$lzycompute(commands.scala:75)\n\tat org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
