hililiwei edited a comment on pull request #3784:
URL: https://github.com/apache/iceberg/pull/3784#issuecomment-1022891787


   > @hililiwei, can you describe how you're estimating the size of data that 
is buffered in memory for ORC? I think a description to explain to reviewers 
would help.
   
   If a file is being written, to estimate its size,  in three steps:
   1. Size of data that has been written to `stripe`.The value is obtained by 
summing the `offset` and `length `of the last stripe of the `writer`.
   2. Size of data that has been submitted to the `writer `but has not been 
written to the stripe. When creating OrcFileAppender, `treeWriter` is obtained 
through reflection,  and use its `estimateMemory` to estimate how much memory 
is being used.
   3. Data that has not been submitted to the `writer`, that is, the size of 
the buffer. The maximum default value of the buffer is used here.
   
   Add these three values to estimate the data size.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to