returnString commented on a change in pull request #10063:
URL: https://github.com/apache/arrow/pull/10063#discussion_r614932667
##########
File path: rust/arrow/src/record_batch.rs
##########
@@ -245,6 +295,12 @@ impl RecordBatch {
}
}
+impl Default for RecordBatch {
Review comment:
Not totally sure about this because it bypasses the invariants of
`validate_new_batch` - it seems like we don't want to allow for 0-column record
batches to exist. Might be an argument in favour of the `impl From` approach?
##########
File path: rust/arrow/src/record_batch.rs
##########
@@ -103,6 +103,56 @@ impl RecordBatch {
RecordBatch { schema, columns }
}
+ /// Creates a new [`RecordBatch`] with no columns
+ ///
+ /// TODO add an code example using `append`
+ pub fn new() -> Self {
+ Self {
+ schema: Arc::new(Schema::empty()),
+ columns: Vec::new(),
+ }
+ }
+
+ /// Appends the `field_array` array to this `RecordBatch` as a
+ /// field named `field_name`.
+ ///
+ /// TODO: code example
+ ///
+ /// TODO: on error, can we return `Self` in some meaningful way?
+ pub fn append(self, field_name: &str, field_values: ArrayRef) ->
Result<Self> {
+ if let Some(col) = self.columns.get(0) {
+ if col.len() != field_values.len() {
+ return Err(ArrowError::InvalidArgumentError(
+ format!("all columns in a record batch must have the same
length. expected {}, field {} had {} ",
+ col.len(), field_name, field_values.len())
+ ));
+ }
+ }
+
+ let Self {
+ schema,
+ mut columns,
+ } = self;
+
+ // modify the schema we have if possible, otherwise copy
+ let mut schema = match Arc::try_unwrap(schema) {
+ Ok(schema) => schema,
+ Err(shared_schema) => shared_schema.as_ref().clone(),
+ };
+
+ let nullable = field_values.null_count() > 0;
Review comment:
This isn't an area that I'm super-familiar with, but are there any
existing methods or expectations around merging schemas that differ solely on
nullability, e.g. merging two schemas together and inferring the nullability of
each field by ORing the nullability on each side of the merge? So if _either_
schema declares the field as nullable, we take the resultant schema as nullable.
If not, then I'd agree that we probably should keep this explicitly declared
and not infer it here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]