[GitHub] [hudi] prashantwason commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

via GitHub Fri, 05 May 2023 11:02:32 -0700


prashantwason commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536586226

@clownxc If I understand correctly, the memory savings are coming from
dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has
only 2 additional members - sealed (boolean) and data (t). Are the savings due
to usage of the mock class (which may have bloating compared to the original
HoodieRecord)?

But hoodie write handles [deflate the HoodieRecord
](https://github.com/apache/hudi/blob/cabcb2bf2cddedeb3a34047af3935b27cfdfb858/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L167)after
writing so the data portion should go away reducing the amount of savings
possible.

Can you run the test again with these changes:
1. WriteStatus status = new WriteStatus(true, 1.0); // enable success
record tracking as errors should be rare
2. Create an actual HoodieRecord and use that in the for loop instead of
the mock(HoodieRecord.class)
3. Call deflate on the create HoodieRecord to remove the data as the write
handles do.

I feel the above may give a more realistic view of savings.

Also, how did you find this interesting optimization? I am interested as
there may be other avenues of such savings within HUDI so if would be good to
know how you track these.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] prashantwason commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Reply via email to