clownxc commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536932897
> @clownxc If I understand correctly, the memory savings are coming from dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has only 2 additional members - sealed (boolean) and data (t). Are the savings due to usage of the mock class (which may have bloating compared to the original HoodieRecord)? > > But hoodie write handles [deflate the HoodieRecord ](https://github.com/apache/hudi/blob/cabcb2bf2cddedeb3a34047af3935b27cfdfb858/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L167)after writing so the data portion should go away reducing the amount of savings possible. > > Can you run the test again with these changes: > > 1. WriteStatus status = new WriteStatus(true, 1.0); // enable success record tracking as errors should be rare > 2. Create an actual HoodieRecord and use that in the for loop instead of the mock(HoodieRecord.class) > 3. Call deflate on the create HoodieRecord to remove the data as the write handles do. > > I feel the above may give a more realistic view of savings. > > Also, how did you find this interesting optimization? I am interested as there may be other avenues of such savings within HUDI so if would be good to know how you track these. I would be happy to do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
