alamb commented on code in PR #8257: URL: https://github.com/apache/arrow-rs/pull/8257#discussion_r2351939223
########## parquet/src/column/writer/mod.rs: ########## @@ -1104,12 +1104,23 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> { rep_levels_byte_len + def_levels_byte_len + values_data.buf.len(); // Data Page v2 compresses values only. - match self.compressor { + let is_compressed = match self.compressor { Some(ref mut cmpr) => { + let buffer_len = buffer.len(); cmpr.compress(&values_data.buf, &mut buffer)?; + if uncompressed_size <= buffer.len() - buffer_len { Review Comment: In my opinion, given there is a tradeoff here (file size / decode speed) we can't hard code some heuristic that changes the default behavior. However, I think it would be a good improvement to add some sort of tuning knob for people who wanted to make that tradeoff more explicitly as long as the default setting of the knob leaves the existing behavior -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org