weimingdiit commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1498766781
> @weimingdiit thanks for making the patch. I see the main problem here is that it's using in-memory size for the estimation which is actually intended for storage size, which may not be accurate. I have a different approach to estimate the size using sample write in [this PR](https://github.com/apache/hudi/pull/8390). pls take a look This is a good way, maybe the estimated size will be more accurate, but I think I need to use the data set to test to determine how much error the memory estimate will have, and whether this error is acceptable; in addition, I think my implementation may be more simpler, does not need to use the file system, and will save some overhead of creating/reading files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
