alamb commented on code in PR #9400:
URL: https://github.com/apache/arrow-rs/pull/9400#discussion_r2801141572
##########
parquet/src/compression.rs:
##########
@@ -503,20 +503,27 @@ pub use lz4_codec::*;
#[cfg(any(feature = "zstd", test))]
mod zstd_codec {
- use std::io::{self, Write};
-
use crate::compression::{Codec, ZstdLevel};
use crate::errors::Result;
/// Codec for Zstandard compression algorithm.
+ ///
+ /// Uses `zstd::bulk` API with reusable compressor/decompressor contexts
+ /// to avoid the overhead of reinitializing contexts for each operation.
pub struct ZSTDCodec {
- level: ZstdLevel,
+ compressor: zstd::bulk::Compressor<'static>,
+ decompressor: zstd::bulk::Decompressor<'static>,
Review Comment:
FWIW that is what the underlying compressor docs seem to suggest as well:
<img width="873" height="305" alt="Image"
src="https://github.com/user-attachments/assets/1896b263-e919-4541-b0f7-22ba1d17b330"
/>
However, given this one is already better than main i think we could merge
this as is and then look into a thread-local as a follow on
One downside of a thread-local is that it isn't clear to me how it would
ever be cleared 🤔
##########
arrow-ipc/src/reader.rs:
##########
@@ -490,6 +495,7 @@ impl<'a> RecordBatchDecoder<'a> {
schema,
dictionaries_by_id,
compression,
+ decompression_context: DecompressionContext::default(),
Review Comment:
as a small nit, I personally prefer a `new()` method as well here
```suggestion
decompression_context: DecompressionContext::new(),
```
And the new method can just call through to the Default impl
I find that more consistent
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]