Re: [PR] [10125] [encode path] Minor optimizations to arrow-flight [arrow-rs]

via GitHub Fri, 26 Jun 2026 11:55:59 -0700


alamb commented on code in PR #10137:
URL: https://github.com/apache/arrow-rs/pull/10137#discussion_r3483446258



##########
arrow-ipc/src/compression.rs:
##########
@@ -18,22 +18,24 @@
 use crate::CompressionType;
 use arrow_buffer::Buffer;
 use arrow_schema::ArrowError;
+use flatbuffers::FlatBufferBuilder;
 
 const LENGTH_NO_COMPRESSED_DATA: i64 = -1;
 const LENGTH_OF_PREFIX_DATA: i64 = 8;
 
-/// Additional context that may be needed for compression.
-///
-/// In the case of zstd, this will contain the zstd context, which can be 
reused between subsequent
-/// compression calls to avoid the performance overhead of initialising a new 
context for every
-/// compression.
+/// - The flatbuffer builder (`fbb`) is reset and reused across calls.
+/// - The zstd compressor (when enabled) is kept alive to avoid 
re-initialisation overhead.
 #[derive(Default)]
-pub struct CompressionContext {
+pub struct IpcWriteContext {

Review Comment:
   I think since this is a public struct we can't rename it wihtout breaking 
downstream consumers
   
https://docs.rs/arrow-ipc/latest/arrow_ipc/writer/struct.CompressionContext.html
   
   How about adding a deprecated tpyedef?
   
   ```rust
     #[deprecated(since = "57.0.0", note = "Use IpcWriteContext instead")]
     pub type CompressionContext = IpcWriteContext;
   ```



##########
arrow-flight/src/encode.rs:
##########
@@ -701,7 +701,7 @@ struct FlightIpcEncoder {
     options: IpcWriteOptions,
     data_gen: IpcDataGenerator,
     dictionary_tracker: DictionaryTracker,
-    compression_context: CompressionContext,
+    compression_context: IpcWriteContext,

Review Comment:
   Can we also chang the field name to match the new IpcWriteContext ?
   
   ```rust
       ipc_write_context: IpcWriteContext,
   ```
   
   There are many other places too like 
   
https://github.com/apache/arrow-rs/blob/bbca95eed8d3b2eaae5934ce9fe953a1757fc0a8/arrow-ipc/src/writer.rs#L551-L550



##########
arrow-ipc/src/compression.rs:
##########
@@ -18,22 +18,24 @@
 use crate::CompressionType;
 use arrow_buffer::Buffer;
 use arrow_schema::ArrowError;
+use flatbuffers::FlatBufferBuilder;
 
 const LENGTH_NO_COMPRESSED_DATA: i64 = -1;
 const LENGTH_OF_PREFIX_DATA: i64 = 8;
 
-/// Additional context that may be needed for compression.
-///
-/// In the case of zstd, this will contain the zstd context, which can be 
reused between subsequent
-/// compression calls to avoid the performance overhead of initialising a new 
context for every
-/// compression.
+/// - The flatbuffer builder (`fbb`) is reset and reused across calls.
+/// - The zstd compressor (when enabled) is kept alive to avoid 
re-initialisation overhead.
 #[derive(Default)]
-pub struct CompressionContext {
+pub struct IpcWriteContext {
     #[cfg(feature = "zstd")]
     compressor: Option<zstd::bulk::Compressor<'static>>,
+    pub(crate) fbb: FlatBufferBuilder<'static>,

Review Comment:
   Can we please not make the fields `pub` / `pub (crate)` and instead add 
accessor methods -- that way we will have less API surface to break int he 
future



##########
arrow-ipc/src/writer.rs:
##########
@@ -548,21 +548,34 @@ impl IpcDataGenerator {
         batch: &RecordBatch,
         dictionary_tracker: &mut DictionaryTracker,
         write_options: &IpcWriteOptions,
-        compression_context: &mut CompressionContext,
+        compression_context: &mut IpcWriteContext,
     ) -> Result<(Vec<EncodedData>, EncodedData), ArrowError> {
         let encoded_dictionaries = self.encode_all_dicts(
             batch,
             dictionary_tracker,
             write_options,
             compression_context,
         )?;
-        let mut arrow_data = Vec::new();
-        let (ipc_message, _, tail_pad) = self.record_batch_to_bytes(
+        let capacity = batch

Review Comment:
   what is this code doing? I see it o some size estimateion but O don't 
understand the `arrow_data.extend_from_slice` calls -- why not keep this as 
IpcBodySink::write and just pass the arrow_data from the compression context?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [10125] [encode path] Minor optimizations to arrow-flight [arrow-rs]

Reply via email to