Rich-T-kid commented on code in PR #10137:
URL: https://github.com/apache/arrow-rs/pull/10137#discussion_r3484887705


##########
arrow-ipc/src/writer.rs:
##########
@@ -548,21 +548,34 @@ impl IpcDataGenerator {
         batch: &RecordBatch,
         dictionary_tracker: &mut DictionaryTracker,
         write_options: &IpcWriteOptions,
-        compression_context: &mut CompressionContext,
+        compression_context: &mut IpcWriteContext,
     ) -> Result<(Vec<EncodedData>, EncodedData), ArrowError> {
         let encoded_dictionaries = self.encode_all_dicts(
             batch,
             dictionary_tracker,
             write_options,
             compression_context,
         )?;
-        let mut arrow_data = Vec::new();
-        let (ipc_message, _, tail_pad) = self.record_batch_to_bytes(
+        let capacity = batch

Review Comment:
   Yes your right here. I was experimenting with collecting all the buffers 
first and then writing them to the pre-allocated vector to see if the compiler 
will optimize this into very fast memcpy operations. It more intuitive to use 
the `IpcBodySink::Write()` here with the stored buffer. 
   The issue here is that the scratch space field can really only work once. 
Since `EncodedData` needs to take ownership of the vector its gets moved out of 
the struct. Im taking another attempt at implementing a buffer pool here.



##########
arrow-ipc/src/writer.rs:
##########
@@ -548,21 +548,34 @@ impl IpcDataGenerator {
         batch: &RecordBatch,
         dictionary_tracker: &mut DictionaryTracker,
         write_options: &IpcWriteOptions,
-        compression_context: &mut CompressionContext,
+        compression_context: &mut IpcWriteContext,
     ) -> Result<(Vec<EncodedData>, EncodedData), ArrowError> {
         let encoded_dictionaries = self.encode_all_dicts(
             batch,
             dictionary_tracker,
             write_options,
             compression_context,
         )?;
-        let mut arrow_data = Vec::new();
-        let (ipc_message, _, tail_pad) = self.record_batch_to_bytes(
+        let capacity = batch

Review Comment:
   Yes your right. I was experimenting with collecting all the buffers first 
and then writing them to the pre-allocated vector to see if the compiler will 
optimize this into very fast memcpy operations. It more intuitive to use the 
`IpcBodySink::Write()` here with the stored buffer. 
   The issue here is that the scratch space field can really only work once. 
Since `EncodedData` needs to take ownership of the vector its gets moved out of 
the struct. Im taking another attempt at implementing a buffer pool here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to