kasured opened a new issue, #7246:
URL: https://github.com/apache/hudi/issues/7246

   **Describe the problem you faced**
   
   We started seeing OutOfMemory issues while finalizing writes to the Hudi COW 
table on the step of archival
   
   Documentation is not quite clear how to tune the retention period of the 
archival process and hudi-cli does not have options to clear these archived 
commits. 
   
   What are the options to unblock that process. We have a lot of memory 
allocated so we would not like to go that way of increasing that even more
   
   .hoodie/archived folder is already 200 files and have 3GB of space allocated
   
   **Environment Description**
   
   * Hudi version : 0.11.0
   
   * Spark version : 0.11.0-amzn-1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```es]]\n\n\tat 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:325)\n\tat
 org.apache.spark.sq
   
l.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:209)\nCaused
 by: java.lang.OutOfMemoryError\n\tat 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.jav
   a:123)\n\tat 
java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)\n\tat 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)\n\tat
 java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)\n\tat 
java.io.DataOutputStream.write(DataOutputStream.java:107)\n\tat 
java.io.FilterOutputStream.write(FilterOutputStream.java:97)\n\tat 
org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.serializeRecords(HoodieAvroDataBlock.java:127)\n\tat
 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)\n\tat
 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)\n\tat
 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlock(HoodieLogFormatWriter.java:135)\n\tat
 
org.apache.hudi.client.HoodieTimelineArchiver.writeToFile(HoodieTimelineArchiver.java:628)\n\tat
 
org.apache.hudi.client.HoodieTimelineArchiver.archive(HoodieTimelineArchiver.java:600)\n\tat
 org.ap
 
ache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:169)\n\tat
 
org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:907)\n\tat
 
org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:629)\n\tat
 
org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:534)\n\tat
 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:236)\n\tat
 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:122)\n\tat
 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:678)\n\tat
 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:313)\n\tat
 org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:165)\n\tat 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)\n\tat
 org.apache.spark.sql.execution.command.ExecutedComm
 andExec.sideEffectResult$lzycompute(commands.scala:75)\n\tat 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to