weimingdiit commented on PR #7362:
URL: https://github.com/apache/hudi/pull/7362#issuecomment-1498766781

   > @weimingdiit thanks for making the patch. I see the main problem here is 
that it's using in-memory size for the estimation which is actually intended 
for storage size, which may not be accurate. I have a different approach to 
estimate the size using sample write in [this 
PR](https://github.com/apache/hudi/pull/8390). pls take a look
   
   This is a good way, maybe the estimated size will be more accurate, but I 
think I need to use the data set to test to determine how much error the memory 
estimate will have, and whether this error is acceptable; in addition, I think 
my implementation may be more simpler, does not need to use the file system, 
and will save some overhead of creating/reading files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to