[GitHub] [arrow-rs] alamb opened a new issue #210: Add Builder interface for adding Arrays to record batches

GitBox Mon, 26 Apr 2021 05:51:12 -0700


alamb opened a new issue #210:
URL: https://github.com/apache/arrow-rs/issues/210



   *Note*: migrated from original JIRA: 
https://issues.apache.org/jira/browse/ARROW-12411
   
   
   Use case:
   
   While writing tests (both in IOx and in DataFusion) where I need a single 
`RecordBatch`, I often find myself doing something like this:
   
   ```
           let schema = Arc::new(Schema::new(vec![
               ArrowField::new("float_field", ArrowDataType::Float64, true),
               ArrowField::new("time", ArrowDataType::Int64, true),
           ]));
   
           let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 
20.1, 30.1, 40.1]));
           let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 
2000, 3000, 4000]));
   
           let batch = RecordBatch::try_new(schema, vec![float_array, 
timestamp_array])
               .expect("created new record batch");
   ```
   
   This is annoying because the information that `float_field` is a float is 
encoded both in the Schema and the `Float64Array`
   
   I would much rather rather be able to construct RecordBatches a a builder 
style to avoid the the redundancy and reduce the amount of typing / redundancy:
   
   
   ```
   
           let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 
20.1, 30.1, 40.1]));
           let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 
2000, 3000, 4000]));
   
           let batch = RecordBatch::empty()
             .append("float_field", timestamp_array).unwrap()
             .append("time", float_array).unwrap;
   
   ```
   
   The proposal is to add a method to `RecordBatch` like
   
   ```
   impl RecordBatch {
   ...
     fn append(self, field_name: &str, field_values: ArrayRef) -> Result<Self>
   }
   ```
   
   That would append the a field name to the current schema, returning an error 
if field_name was already present.
   
   The nullability of the field would be set based on the actual null count of 
the field_values
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] alamb opened a new issue #210: Add Builder interface for adding Arrays to record batches

Reply via email to