etseidl commented on code in PR #9972:
URL: https://github.com/apache/arrow-rs/pull/9972#discussion_r3243284215
##########
parquet/src/data_type.rs:
##########
@@ -726,6 +726,17 @@ pub(crate) mod private {
(std::mem::size_of::<Self>(), 1)
}
+ /// Estimated encoded byte size of this value when serialized into a
+ /// plain-encoded data page. Used by the column writer to decide
+ /// whether to mini-batch a chunk in one call or value-by-value, so
+ /// that a single mini-batch of very large `BYTE_ARRAY` values can't
+ /// push a page far past the configured page byte limit before the
+ /// post-write size check fires.
+ #[inline]
+ fn byte_size(&self) -> usize {
Review Comment:
This seems to duplicate `dict_encoding_size`. Also, #9700 wants to rename
`dict_encoding_size` and instead implement it pretty much the same way as here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]