gaoshihang opened a new issue, #7979:
URL: https://github.com/apache/hudi/issues/7979
I use hudi-cli(0.11.1 version) to do cleans show command, and I get a OOM
exception:
```
hudi:ds_segments->cleans show
2023-02-15 02:33:00,699 INFO timeline.HoodieActiveTimeline: Loaded instants
upto : Option{val=[20230214035435843__clean__COMPLETED]}
Command failed java.lang.OutOfMemoryError: Java heap space
Exception in thread "Spring Shell" java.lang.OutOfMemoryError: Java heap
space
at java.lang.StringCoding.decode(StringCoding.java:215)
at java.lang.String.<init>(String.java:463)
at org.apache.avro.util.Utf8.toString(Utf8.java:158)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:322)
at
org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:219)
at
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:456)
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
at
org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at
org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at
org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at
org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:354)
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:185)
at
org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at
org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:251)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
at
[org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:206](http://org.apache.hudi.common.table.timeline.timelinemetadatautils.deserializeavrometadata%28timelinemetadatautils.java:206/))
at
[org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:170](http://org.apache.hudi.common.table.timeline.timelinemetadatautils.deserializehoodiecleanmetadata%28timelinemetadatautils.java:170/))
at
[org.apache.hudi.cli.commands.CleansCommand.showCleans(CleansCommand.java:74](http://org.apache.hudi.cli.commands.cleanscommand.showcleans%28cleanscommand.java:74/))
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
at
org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
2023-02-15 02:36:14,433 INFO support.GenericApplicationContext: Closing
org.springframework.context.support.GenericApplicationContext@47ef968d: startup
date [Wed Feb 15 02:32:39 UTC 2023]; root of context hierarchy
2023-02-15 02:36:14,435 INFO support.DefaultLifecycleProcessor: Stopping
beans in phase 1
2023-02-15 02:36:14,441 INFO impl.MetricsSystemImpl: Stopping
s3a-file-system metrics system...
2023-02-15 02:36:14,441 INFO impl.MetricsSystemImpl: s3a-file-system metrics
system stopped.
2023-02-15 02:36:14,442 INFO impl.MetricsSystemImpl: s3a-file-system metrics
system shutdown complete.
```
Then I checked the code in CleansCommand.java and found that when I do
cleans show, it will get all the clean first, and deserialize avro, which
causes OOM.
```
HoodieActiveTimeline activeTimeline =
HoodieCLI.getTableMetaClient().getActiveTimeline();
HoodieTimeline timeline =
activeTimeline.getCleanerTimeline().filterCompletedInstants();
List<HoodieInstant> cleans =
timeline.getReverseOrderedInstants().collect(Collectors.toList());
List<Comparable[]> rows = new ArrayList<>();
**for (HoodieInstant clean : cleans) {
HoodieCleanMetadata cleanMetadata =
TimelineMetadataUtils.deserializeHoodieCleanMetadata(timeline.getInstantDetails(clean).get());
rows.add(new Comparable[]{clean.getTimestamp(),
cleanMetadata.getEarliestCommitToRetain(),
cleanMetadata.getTotalFilesDeleted(),
cleanMetadata.getTimeTakenInMillis()});
cleanMetadata = null;
}**
TableHeader header =
new
TableHeader().addTableHeaderField(HoodieTableHeaderFields.HEADER_CLEAN_TIME)
.addTableHeaderField(HoodieTableHeaderFields.HEADER_EARLIEST_COMMAND_RETAINED)
.addTableHeaderField(HoodieTableHeaderFields.HEADER_TOTAL_FILES_DELETED)
.addTableHeaderField(HoodieTableHeaderFields.HEADER_TOTAL_TIME_TAKEN);
return HoodiePrintHelper.print(header, new HashMap<>(), sortByField,
descending, limit, headerOnly, rows);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]