ehurheap commented on issue #9796:
URL: https://github.com/apache/hudi/issues/9796#issuecomment-1743294080
We have not tried to manually rollback any rollback commits. But the job did
progress so I assume the retries were eventually successful.
After several hours the job did eventually fail, with this exception (which
seems to be at a later stage?):
```
23/10/01 22:24:29 ERROR HoodieCleaner: Fail to run cleaning for
s3://bucket-redacted/tablepath-redacted
java.lang.OutOfMemoryError: null
at
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
~[?:1.8.0_382]
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
~[?:1.8.0_382]
at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
~[?:1.8.0_382]
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
~[?:1.8.0_382]
at
org.apache.avro.io.DirectBinaryEncoder.writeFixed(DirectBinaryEncoder.java:124)
~[avro-1.11.0.jar:1.11.0]
at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:57)
~[avro-1.11.0.jar:1.11.0]
at org.apache.avro.io.Encoder.writeString(Encoder.java:130)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:346)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.specific.SpecificDatumWriter.writeString(SpecificDatumWriter.java:72)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:151)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:145)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:100)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:210)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:84)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:131)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:257)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:137)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeMap(GenericDatumWriter.java:305)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:140)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:145)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:100)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:210)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:84)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:131)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
~[avro-1.11.0.jar:1.11.0]
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:314)
~[avro-1.11.0.jar:1.11.0]
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.serializeAvroMetadata(TimelineMetadataUtils.java:159)
~[__app__.jar:0.13.0]
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.serializeCleanerPlan(TimelineMetadataUtils.java:114)
~[__app__.jar:0.13.0]
at
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:158)
~[__app__.jar:0.13.0]
at
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:176)
~[__app__.jar:0.13.0]
at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:198)
~[__app__.jar:0.13.0]
at
org.apache.hudi.client.BaseHoodieTableServiceClient.scheduleTableServiceInternal(BaseHoodieTableServiceClient.java:430)
~[__app__.jar:0.13.0]
at
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:543)
~[__app__.jar:0.13.0]
at
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:758)
~[__app__.jar:0.13.0]
at
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:730)
~[__app__.jar:0.13.0]
at
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:762)
~[__app__.jar:0.13.0]
at org.apache.hudi.utilities.HoodieCleaner.run(HoodieCleaner.java:69)
~[__app__.jar:0.13.0]
at org.apache.hudi.utilities.HoodieCleaner.main(HoodieCleaner.java:111)
~[__app__.jar:0.13.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_382]
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_382]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_382]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:742)
~[spark-yarn_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0]
23/10/01 22:24:29 INFO SparkUI: Stopped Spark web UI at
http://ip-10-18-33-241.heap:8090
23/10/01 22:24:29 INFO YarnClusterSchedulerBackend: Shutting down all
executors
23/10/01 22:24:29 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each
executor to shut down
23/10/01 22:24:29 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
23/10/01 22:24:29 INFO MemoryStore: MemoryStore cleared
23/10/01 22:24:29 INFO BlockManager: BlockManager stopped
23/10/01 22:24:29 INFO BlockManagerMaster: BlockManagerMaster stopped
23/10/01 22:24:29 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
23/10/01 22:24:29 INFO SparkContext: Successfully stopped SparkContext
```
This is not the first case where we ran into OutOfMemory due to the java
`ByteArrayOutputStream` limitation of 2GB. It seems that hudi should handle
large arrays better - are there any plans to address this?
I am going to try running the cleaner again specifying cleaner configs
(`hoodie.cleaner.commits.retained`, `hoodie.keep.min.commits`,
`hoodie.keep.max.commits`) that will request fewer files to be cleaned per run
to try to catch up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]