alamb opened a new issue #210: URL: https://github.com/apache/arrow-rs/issues/210
*Note*: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-12411 Use case: While writing tests (both in IOx and in DataFusion) where I need a single `RecordBatch`, I often find myself doing something like this: ``` let schema = Arc::new(Schema::new(vec![ ArrowField::new("float_field", ArrowDataType::Float64, true), ArrowField::new("time", ArrowDataType::Int64, true), ])); let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1])); let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000])); let batch = RecordBatch::try_new(schema, vec![float_array, timestamp_array]) .expect("created new record batch"); ``` This is annoying because the information that `float_field` is a float is encoded both in the Schema and the `Float64Array` I would much rather rather be able to construct RecordBatches a a builder style to avoid the the redundancy and reduce the amount of typing / redundancy: ``` let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1])); let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000])); let batch = RecordBatch::empty() .append("float_field", timestamp_array).unwrap() .append("time", float_array).unwrap; ``` The proposal is to add a method to `RecordBatch` like ``` impl RecordBatch { ... fn append(self, field_name: &str, field_values: ArrayRef) -> Result<Self> } ``` That would append the a field name to the current schema, returning an error if field_name was already present. The nullability of the field would be set based on the actual null count of the field_values -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
