[GitHub] [arrow] returnString commented on a change in pull request #10063: ARROW-12411: [Rust] Add Builder interface for adding Arrays to RecordBatches

GitBox Fri, 16 Apr 2021 08:40:46 -0700


returnString commented on a change in pull request #10063:
URL: https://github.com/apache/arrow/pull/10063#discussion_r614932667




##########
File path: rust/arrow/src/record_batch.rs
##########
@@ -245,6 +295,12 @@ impl RecordBatch {
     }
 }
 
+impl Default for RecordBatch {

Review comment:
       Not totally sure about this because it bypasses the invariants of 
`validate_new_batch` - it seems like we don't want to allow for 0-column record 
batches to exist. Might be an argument in favour of the `impl From` approach?

##########
File path: rust/arrow/src/record_batch.rs
##########
@@ -103,6 +103,56 @@ impl RecordBatch {
         RecordBatch { schema, columns }
     }
 
+    /// Creates a new [`RecordBatch`] with no columns
+    ///
+    /// TODO add an code example using `append`
+    pub fn new() -> Self {
+        Self {
+            schema: Arc::new(Schema::empty()),
+            columns: Vec::new(),
+        }
+    }
+
+    /// Appends the `field_array` array to this `RecordBatch` as a
+    /// field named `field_name`.
+    ///
+    /// TODO: code example
+    ///
+    /// TODO: on error, can we return `Self` in some meaningful way?
+    pub fn append(self, field_name: &str, field_values: ArrayRef) -> 
Result<Self> {
+        if let Some(col) = self.columns.get(0) {
+            if col.len() != field_values.len() {
+                return Err(ArrowError::InvalidArgumentError(
+                    format!("all columns in a record batch must have the same 
length. expected {}, field {} had {} ",
+                            col.len(), field_name, field_values.len())
+                ));
+            }
+        }
+
+        let Self {
+            schema,
+            mut columns,
+        } = self;
+
+        // modify the schema we have if possible, otherwise copy
+        let mut schema = match Arc::try_unwrap(schema) {
+            Ok(schema) => schema,
+            Err(shared_schema) => shared_schema.as_ref().clone(),
+        };
+
+        let nullable = field_values.null_count() > 0;

Review comment:
       This isn't an area that I'm super-familiar with, but are there any 
existing methods or expectations around merging schemas that differ solely on 
nullability, e.g. merging two schemas together and inferring the nullability of 
each field by ORing the nullability on each side of the merge? So if _either_ 
schema declares the field as nullable, we take the resultant schema as nullable.
   
   If not, then I'd agree that we probably should keep this explicitly declared 
and not infer it here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] returnString commented on a change in pull request #10063: ARROW-12411: [Rust] Add Builder interface for adding Arrays to RecordBatches

Reply via email to