v0y4g3r commented on issue #4724: URL: https://github.com/apache/arrow-rs/issues/4724#issuecomment-1784008583
Ran into the same problem here. What's worse is that the difference between decoded data size and real data size grows disproportionately to number of fields in record batch. <img width="585" alt="image" src="https://github.com/apache/arrow-rs/assets/6406592/9493ac57-f2ef-4091-84d0-7a4ae77f6087"> ```rs use arrow_array::{ArrayRef, PrimitiveArray, RecordBatchWriter}; use arrow_ipc::writer::FileWriter; use arrow_schema::DataType; use datatypes::types::Int64Type; fn encode_and_decode(num_fields: usize) -> (usize, usize) { let fields = (0..num_fields) .map(|idx| { arrow::datatypes::Field::new( &format!("f_{idx}"), arrow::datatypes::DataType::Float64, false, ) }) .collect::<Vec<_>>(); let schema = Arc::new(arrow::datatypes::Schema::new(fields)); let mut buffer = vec![]; let mut writer = FileWriter::try_new(&mut buffer, &*schema).unwrap(); let col_values = (0..num_fields) .map(|idx| { Arc::new(PrimitiveArray::<arrow_array::types::Float64Type>::from( vec![1.0], )) as ArrayRef }) .collect::<Vec<_>>(); let rb = arrow::record_batch::RecordBatch::try_new(schema.clone(), col_values).unwrap(); let write_mem_size = rb.get_array_memory_size(); writer.write(&rb).unwrap(); writer.close().unwrap(); let mut reader = arrow::ipc::reader::FileReader::try_new(std::io::Cursor::new(&buffer), None).unwrap(); let rb = reader.next().unwrap().unwrap(); let read_mem_size = rb.get_array_memory_size(); return (write_mem_size, read_mem_size); } fn main() { for num_cols in (0..51).step_by(10).skip(1) { let (write, read) = encode_and_decode(num_cols); println!("{},{},{}", num_cols, write, read) } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
