alamb commented on a change in pull request #10063:
URL: https://github.com/apache/arrow/pull/10063#discussion_r614746235
##########
File path: rust/arrow/src/record_batch.rs
##########
@@ -103,6 +103,56 @@ impl RecordBatch {
RecordBatch { schema, columns }
}
+ /// Creates a new [`RecordBatch`] with no columns
+ ///
+ /// TODO add an code example using `append`
+ pub fn new() -> Self {
+ Self {
+ schema: Arc::new(Schema::empty()),
+ columns: Vec::new(),
+ }
+ }
+
+ /// Appends the `field_array` array to this `RecordBatch` as a
+ /// field named `field_name`.
+ ///
+ /// TODO: code example
+ ///
+ /// TODO: on error, can we return `Self` in some meaningful way?
+ pub fn append(self, field_name: &str, field_values: ArrayRef) ->
Result<Self> {
+ if let Some(col) = self.columns.get(0) {
+ if col.len() != field_values.len() {
+ return Err(ArrowError::InvalidArgumentError(
+ format!("all columns in a record batch must have the same
length. expected {}, field {} had {} ",
+ col.len(), field_name, field_values.len())
+ ));
+ }
+ }
+
+ let Self {
+ schema,
+ mut columns,
+ } = self;
+
+ // modify the schema we have if possible, otherwise copy
+ let mut schema = match Arc::try_unwrap(schema) {
+ Ok(schema) => schema,
+ Err(shared_schema) => shared_schema.as_ref().clone(),
+ };
+
+ let nullable = field_values.null_count() > 0;
Review comment:
I agree this would be an important point to clarify in the comments. If
you are creating more than one `RecordBatch` you should use the existing api to
create the `RecordBatch` from a `Schema` and `Vec<ArrayRef>`
If you are creating a single one then this is more convenient
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]