prashantwason commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536586226

   @clownxc If I understand correctly, the memory savings are coming from 
dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has 
only 2 additional members - sealed (boolean) and data (t). Are the savings due 
to usage of the mock class (which may have bloating compared to the original 
HoodieRecord)?
   
   But hoodie write handles [deflate the HoodieRecord 
](https://github.com/apache/hudi/blob/cabcb2bf2cddedeb3a34047af3935b27cfdfb858/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L167)after
 writing so the data portion should go away reducing the amount of savings 
possible. 
   
   Can you run the test again with these changes:
     1. WriteStatus status = new WriteStatus(true, 1.0);   // enable success 
record tracking as errors should be rare
     2. Create an actual HoodieRecord and use that in the for loop instead of 
the mock(HoodieRecord.class)
     3. Call deflate on the create HoodieRecord to remove the data as the write 
handles do.
   
   I feel the above may give a more realistic view of savings. 
   
   Also, how did you find this interesting optimization? I am interested as 
there may be other avenues of such savings within HUDI so if would be good to 
know how you track these.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to