lilianm commented on code in PR #8527:
URL: https://github.com/apache/arrow-rs/pull/8527#discussion_r2397549308
##########
parquet/src/column/writer/mod.rs:
##########
@@ -1073,6 +1073,7 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a,
E> {
if let Some(ref mut cmpr) = self.compressor {
let mut compressed_buf =
Vec::with_capacity(uncompressed_size);
cmpr.compress(&buffer[..], &mut compressed_buf)?;
+ compressed_buf.shrink_to_fit();
Review Comment:
The cost of copy is pretty insignifiant because memcpy speed it's around
10000MB/s and compression speed it's around 600MB/s. Underlayer vector use
shink method
https://doc.rust-lang.org/alloc/alloc/trait.Allocator.html#method.shrink. In
standard malloc threadhold for switch to mmap allocation it's 128k and for
shrink the system only unmap page and no need memory copy.
In V2 page buffer is not reserved
For no compress page when compression it's bad i can be a good idea to apply
for V1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]