gaoshihang opened a new issue, #7979:
URL: https://github.com/apache/hudi/issues/7979

   I use hudi-cli(0.11.1 version) to do cleans show command, and I get a OOM 
exception:
   ```
   hudi:ds_segments->cleans show
   2023-02-15 02:33:00,699 INFO timeline.HoodieActiveTimeline: Loaded instants 
upto : Option{val=[20230214035435843__clean__COMPLETED]}
   Command failed java.lang.OutOfMemoryError: Java heap space
   Exception in thread "Spring Shell" java.lang.OutOfMemoryError: Java heap 
space
       at java.lang.StringCoding.decode(StringCoding.java:215)
       at java.lang.String.<init>(String.java:463)
       at org.apache.avro.util.Utf8.toString(Utf8.java:158)
       at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:322)
       at 
org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:219)
       at 
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:456)
       at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
       at 
org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
       at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
       at 
org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
       at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
       at 
org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
       at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
       at 
org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:354)
       at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:185)
       at 
org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
       at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
       at 
org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
       at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
       at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
       at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:251)
       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
       at 
[org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:206](http://org.apache.hudi.common.table.timeline.timelinemetadatautils.deserializeavrometadata%28timelinemetadatautils.java:206/))
       at 
[org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:170](http://org.apache.hudi.common.table.timeline.timelinemetadatautils.deserializehoodiecleanmetadata%28timelinemetadatautils.java:170/))
       at 
[org.apache.hudi.cli.commands.CleansCommand.showCleans(CleansCommand.java:74](http://org.apache.hudi.cli.commands.cleanscommand.showcleans%28cleanscommand.java:74/))
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at 
org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
       at 
org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
   2023-02-15 02:36:14,433 INFO support.GenericApplicationContext: Closing 
org.springframework.context.support.GenericApplicationContext@47ef968d: startup 
date [Wed Feb 15 02:32:39 UTC 2023]; root of context hierarchy
   2023-02-15 02:36:14,435 INFO support.DefaultLifecycleProcessor: Stopping 
beans in phase 1
   2023-02-15 02:36:14,441 INFO impl.MetricsSystemImpl: Stopping 
s3a-file-system metrics system...
   2023-02-15 02:36:14,441 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
system stopped.
   2023-02-15 02:36:14,442 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
system shutdown complete.
   ```
   
   Then I checked the code in CleansCommand.java and found that when I do 
cleans show, it will get all the clean first, and deserialize avro, which 
causes OOM.
   ```
       HoodieActiveTimeline activeTimeline = 
HoodieCLI.getTableMetaClient().getActiveTimeline();
       HoodieTimeline timeline = 
activeTimeline.getCleanerTimeline().filterCompletedInstants();
       List<HoodieInstant> cleans = 
timeline.getReverseOrderedInstants().collect(Collectors.toList());
       List<Comparable[]> rows = new ArrayList<>();
       **for (HoodieInstant clean : cleans) {
         HoodieCleanMetadata cleanMetadata = 
TimelineMetadataUtils.deserializeHoodieCleanMetadata(timeline.getInstantDetails(clean).get());
         rows.add(new Comparable[]{clean.getTimestamp(), 
cleanMetadata.getEarliestCommitToRetain(),
                 cleanMetadata.getTotalFilesDeleted(), 
cleanMetadata.getTimeTakenInMillis()});
         cleanMetadata = null;
       }**
   
       TableHeader header =
           new 
TableHeader().addTableHeaderField(HoodieTableHeaderFields.HEADER_CLEAN_TIME)
               
.addTableHeaderField(HoodieTableHeaderFields.HEADER_EARLIEST_COMMAND_RETAINED)
               
.addTableHeaderField(HoodieTableHeaderFields.HEADER_TOTAL_FILES_DELETED)
               
.addTableHeaderField(HoodieTableHeaderFields.HEADER_TOTAL_TIME_TAKEN);
       return HoodiePrintHelper.print(header, new HashMap<>(), sortByField, 
descending, limit, headerOnly, rows);
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to