Re: [I] Size of (IPC) decompressed data can be larger then the original data [arrow-rs]

via GitHub Sat, 28 Oct 2023 23:05:37 -0700


v0y4g3r commented on issue #4724:
URL: https://github.com/apache/arrow-rs/issues/4724#issuecomment-1784008583


   Ran into the same problem here. What's worse is that the difference between 
decoded data size and real data size grows disproportionately to number of 
fields in record batch.
   
   <img width="585" alt="image" 
src="https://github.com/apache/arrow-rs/assets/6406592/9493ac57-f2ef-4091-84d0-7a4ae77f6087";>
   
   ```rs
   use arrow_array::{ArrayRef, PrimitiveArray, RecordBatchWriter};
   use arrow_ipc::writer::FileWriter;
   use arrow_schema::DataType;
   use datatypes::types::Int64Type;
   
   fn encode_and_decode(num_fields: usize) -> (usize, usize) {
       let fields = (0..num_fields)
           .map(|idx| {
               arrow::datatypes::Field::new(
                   &format!("f_{idx}"),
                   arrow::datatypes::DataType::Float64,
                   false,
               )
           })
           .collect::<Vec<_>>();
   
       let schema = Arc::new(arrow::datatypes::Schema::new(fields));
   
       let mut buffer = vec![];
       let mut writer = FileWriter::try_new(&mut buffer, &*schema).unwrap();
   
       let col_values = (0..num_fields)
           .map(|idx| {
               Arc::new(PrimitiveArray::<arrow_array::types::Float64Type>::from(
                   vec![1.0],
               )) as ArrayRef
           })
           .collect::<Vec<_>>();
       let rb = arrow::record_batch::RecordBatch::try_new(schema.clone(), 
col_values).unwrap();
   
       let write_mem_size = rb.get_array_memory_size();
       writer.write(&rb).unwrap();
       writer.close().unwrap();
   
       let mut reader =
           
arrow::ipc::reader::FileReader::try_new(std::io::Cursor::new(&buffer), 
None).unwrap();
       let rb = reader.next().unwrap().unwrap();
       let read_mem_size = rb.get_array_memory_size();
       return (write_mem_size, read_mem_size);
   }
   
   fn main() {
       for num_cols in (0..51).step_by(10).skip(1) {
           let (write, read) = encode_and_decode(num_cols);
           println!("{},{},{}", num_cols, write, read)
       }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Size of (IPC) decompressed data can be larger then the original data [arrow-rs]

Reply via email to