etseidl commented on code in PR #8257:
URL: https://github.com/apache/arrow-rs/pull/8257#discussion_r2322757407


##########
parquet/src/column/writer/mod.rs:
##########
@@ -1104,12 +1104,23 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, 
E> {
                     rep_levels_byte_len + def_levels_byte_len + 
values_data.buf.len();
 
                 // Data Page v2 compresses values only.
-                match self.compressor {
+                let is_compressed = match self.compressor {
                     Some(ref mut cmpr) => {
+                        let buffer_len = buffer.len();
                         cmpr.compress(&values_data.buf, &mut buffer)?;
+                        if uncompressed_size <= buffer.len() - buffer_len {

Review Comment:
   I'd suggest maybe going the opposite direction.  If compression doesn't give 
say a 10% size reduction, then don't compress...it will be faster to read. 10% 
might be too much...maybe we could make it configurable.
   
   Anyway, that's probably out of scope for this PR. Let's just go with the 
simple test for now...it's still an improvement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to