Rich-T-kid commented on code in PR #10137:
URL: https://github.com/apache/arrow-rs/pull/10137#discussion_r3484887705
##########
arrow-ipc/src/writer.rs:
##########
@@ -548,21 +548,34 @@ impl IpcDataGenerator {
batch: &RecordBatch,
dictionary_tracker: &mut DictionaryTracker,
write_options: &IpcWriteOptions,
- compression_context: &mut CompressionContext,
+ compression_context: &mut IpcWriteContext,
) -> Result<(Vec<EncodedData>, EncodedData), ArrowError> {
let encoded_dictionaries = self.encode_all_dicts(
batch,
dictionary_tracker,
write_options,
compression_context,
)?;
- let mut arrow_data = Vec::new();
- let (ipc_message, _, tail_pad) = self.record_batch_to_bytes(
+ let capacity = batch
Review Comment:
Yes your right here. I was experimenting with collecting all the buffers
first and then writing them to the pre-allocated vector to see if the compiler
will optimize this into very fast memcpy operations. It more intuitive to use
the `IpcBodySink::Write()` here with the stored buffer.
The issue here is that the scratch space field can really only work once.
Since `EncodedData` needs to take ownership of the vector its gets moved out of
the struct. Im taking another attempt at implementing a buffer pool here.
##########
arrow-ipc/src/writer.rs:
##########
@@ -548,21 +548,34 @@ impl IpcDataGenerator {
batch: &RecordBatch,
dictionary_tracker: &mut DictionaryTracker,
write_options: &IpcWriteOptions,
- compression_context: &mut CompressionContext,
+ compression_context: &mut IpcWriteContext,
) -> Result<(Vec<EncodedData>, EncodedData), ArrowError> {
let encoded_dictionaries = self.encode_all_dicts(
batch,
dictionary_tracker,
write_options,
compression_context,
)?;
- let mut arrow_data = Vec::new();
- let (ipc_message, _, tail_pad) = self.record_batch_to_bytes(
+ let capacity = batch
Review Comment:
Yes your right. I was experimenting with collecting all the buffers first
and then writing them to the pre-allocated vector to see if the compiler will
optimize this into very fast memcpy operations. It more intuitive to use the
`IpcBodySink::Write()` here with the stored buffer.
The issue here is that the scratch space field can really only work once.
Since `EncodedData` needs to take ownership of the vector its gets moved out of
the struct. Im taking another attempt at implementing a buffer pool here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]