hililiwei commented on pull request #3784: URL: https://github.com/apache/iceberg/pull/3784#issuecomment-1022891787
> @hililiwei, can you describe how you're estimating the size of data that is buffered in memory for ORC? I think a description to explain to reviewers would help. If a file is being written, to estimate its size, in three steps: 1. Size of data that has been written to `stripe`.The value is obtained by summing the `offset` and `length `of the last stripe of the `writer`. 2. Size of data that has been submitted to the `writer `but has not been written to the stripe. When creating OrcFileAppender, the `treeWriter` is obtained through reflection, and use its `estimateMemory` to estimate how much memory is being used. 3. Data that has not been submitted to the `writer`, that is, the size of the buffer. The maximum default value of the buffer is used here. Add these three values to estimate the data size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
