tustvold commented on code in PR #4280:
URL: https://github.com/apache/arrow-rs/pull/4280#discussion_r1208030245
##########
parquet/src/column/writer/mod.rs:
##########
@@ -421,10 +449,24 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a,
E> {
/// Returns total number of bytes written by this column writer so far.
/// This value is also returned when column writer is closed.
+ ///
+ /// Note: this value does not include any buffered data that has not
+ /// yet been flushed to a page. It is therefore an underestimate
pub fn get_total_bytes_written(&self) -> u64 {
self.column_metrics.total_bytes_written
}
+ /// Returns the estimated total bytes for this column writer
+ ///
+ /// Unlike [`Self::get_total_bytes_written`] this includes an estimate
+ /// of any data that has not yet been flushed to a page
+ #[cfg(feature = "arrow")]
+ pub(crate) fn get_estimated_total_bytes(&self) -> u64 {
Review Comment:
I've left this pub(crate) as I'm not sure about the returned value, it is
potentially misleading as it doesn't include the impact of any block
compression on the size of the final output
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]