ehurheap commented on issue #9796:
URL: https://github.com/apache/hudi/issues/9796#issuecomment-1743294080

   We have not tried to manually rollback any rollback commits. But the job did 
progress so I assume the retries were eventually successful.
   
   After several hours the job did eventually fail, with this exception (which 
seems to be at a later stage?):
   
   ```
   23/10/01 22:24:29 ERROR HoodieCleaner: Fail to run cleaning for 
s3://bucket-redacted/tablepath-redacted
   java.lang.OutOfMemoryError: null
        at 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) 
~[?:1.8.0_382]
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) 
~[?:1.8.0_382]
        at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) 
~[?:1.8.0_382]
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) 
~[?:1.8.0_382]
        at 
org.apache.avro.io.DirectBinaryEncoder.writeFixed(DirectBinaryEncoder.java:124) 
~[avro-1.11.0.jar:1.11.0]
        at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:57) 
~[avro-1.11.0.jar:1.11.0]
        at org.apache.avro.io.Encoder.writeString(Encoder.java:130) 
~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:346)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.specific.SpecificDatumWriter.writeString(SpecificDatumWriter.java:72)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:151)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83) 
~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:145)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:100)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:210)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:84)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:131)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83) 
~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:257)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:137)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83) 
~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeMap(GenericDatumWriter.java:305)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:140)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83) 
~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:145)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:100)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:210)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:84)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:131)
 ~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83) 
~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73) 
~[avro-1.11.0.jar:1.11.0]
        at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:314) 
~[avro-1.11.0.jar:1.11.0]
        at 
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.serializeAvroMetadata(TimelineMetadataUtils.java:159)
 ~[__app__.jar:0.13.0]
        at 
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.serializeCleanerPlan(TimelineMetadataUtils.java:114)
 ~[__app__.jar:0.13.0]
        at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:158)
 ~[__app__.jar:0.13.0]
        at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:176)
 ~[__app__.jar:0.13.0]
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:198)
 ~[__app__.jar:0.13.0]
        at 
org.apache.hudi.client.BaseHoodieTableServiceClient.scheduleTableServiceInternal(BaseHoodieTableServiceClient.java:430)
 ~[__app__.jar:0.13.0]
        at 
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:543)
 ~[__app__.jar:0.13.0]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:758)
 ~[__app__.jar:0.13.0]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:730)
 ~[__app__.jar:0.13.0]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:762)
 ~[__app__.jar:0.13.0]
        at org.apache.hudi.utilities.HoodieCleaner.run(HoodieCleaner.java:69) 
~[__app__.jar:0.13.0]
        at org.apache.hudi.utilities.HoodieCleaner.main(HoodieCleaner.java:111) 
~[__app__.jar:0.13.0]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_382]
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_382]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_382]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:742)
 ~[spark-yarn_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
   23/10/01 22:24:29 INFO SparkUI: Stopped Spark web UI at 
http://ip-10-18-33-241.heap:8090
   23/10/01 22:24:29 INFO YarnClusterSchedulerBackend: Shutting down all 
executors
   23/10/01 22:24:29 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
executor to shut down
   23/10/01 22:24:29 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
   23/10/01 22:24:29 INFO MemoryStore: MemoryStore cleared
   23/10/01 22:24:29 INFO BlockManager: BlockManager stopped
   23/10/01 22:24:29 INFO BlockManagerMaster: BlockManagerMaster stopped
   23/10/01 22:24:29 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
   23/10/01 22:24:29 INFO SparkContext: Successfully stopped SparkContext
   ```
   
   This is not the first case where we ran into OutOfMemory due to the java 
`ByteArrayOutputStream` limitation of 2GB. It seems that hudi should handle 
large arrays better - are there any plans to address this?
   
   I am going to try running the cleaner again specifying cleaner configs 
(`hoodie.cleaner.commits.retained`, `hoodie.keep.min.commits`, 
`hoodie.keep.max.commits`) that will request fewer files to be cleaned per run 
to try to catch up.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to