alamb commented on code in PR #8257:
URL: https://github.com/apache/arrow-rs/pull/8257#discussion_r2351939223


##########
parquet/src/column/writer/mod.rs:
##########
@@ -1104,12 +1104,23 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, 
E> {
                     rep_levels_byte_len + def_levels_byte_len + 
values_data.buf.len();
 
                 // Data Page v2 compresses values only.
-                match self.compressor {
+                let is_compressed = match self.compressor {
                     Some(ref mut cmpr) => {
+                        let buffer_len = buffer.len();
                         cmpr.compress(&values_data.buf, &mut buffer)?;
+                        if uncompressed_size <= buffer.len() - buffer_len {

Review Comment:
   In my opinion, given there is a tradeoff here (file size / decode speed) we 
can't hard code some heuristic that changes the default behavior.
   
   However, I think it would be a good improvement to add some sort of tuning 
knob for people who wanted to make that tradeoff more explicitly as long as the 
default setting of the knob leaves the existing behavior



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to