alamb commented on code in PR #6096:
URL: https://github.com/apache/arrow-rs/pull/6096#discussion_r1688625372


##########
arrow-row/src/lib.rs:
##########
@@ -738,6 +738,23 @@ impl RowConverter {
         }
     }
 
+    /// Create a new [Rows] instance from the given binary data.
+    pub fn from_binary(&self, array: BinaryArray) -> Rows {

Review Comment:
   I wonder if this function needs to be marked `unsafe` -- I am worried that 
someone inserts invalid data into `Rows` here (e.g. modifies the bytes to read 
in invalid UTF8). 
   
   However, I see that there is already a way to convert between `Rows` and 
`[u8]` and then from `[u8]` to `Rows` (e.g `RowParser::parser`)



##########
arrow-row/src/lib.rs:
##########
@@ -2304,6 +2344,24 @@ mod tests {
                 actual.to_data().validate_full().unwrap();
                 dictionary_eq(actual, expected)
             }
+
+            // Check that we can convert
+            let rows = rows.into_binary();
+            let parser = converter.parser();
+            let back = converter
+                .convert_rows(rows.iter().map(|b| parser.parse(b.expect("valid 
bytes"))))
+                .unwrap();
+            for (actual, expected) in back.iter().zip(&arrays) {
+                actual.to_data().validate_full().unwrap();
+                dictionary_eq(actual, expected)
+            }
+
+            let rows = converter.from_binary(rows);
+            let back = converter.convert_rows(&rows).unwrap();
+            for (actual, expected) in back.iter().zip(&arrays) {
+                actual.to_data().validate_full().unwrap();
+                dictionary_eq(actual, expected)
+            }

Review Comment:
   Could you also add some negative tests (like create a BinaryArray from some 
random bytes and try to convert that back to an array and make sure it panics)?



##########
arrow-row/src/lib.rs:
##########
@@ -738,6 +738,23 @@ impl RowConverter {
         }
     }
 
+    /// Create a new [Rows] instance from the given binary data.

Review Comment:
   Can you please add a doc example showing how to do this? 



##########
arrow-row/src/lib.rs:
##########
@@ -868,6 +885,24 @@ impl Rows {
             + self.buffer.len()
             + self.offsets.len() * std::mem::size_of::<usize>()
     }
+
+    /// Create a [BinaryArray] from the [Rows] data without reallocating the
+    /// underlying bytes.
+    pub fn into_binary(self) -> BinaryArray {
+        assert!(
+            self.buffer.len() <= i32::MAX as usize,
+            "rows buffer too large"
+        );
+        let offsets_scalar = 
ScalarBuffer::from_iter(self.offsets.into_iter().map(i32::usize_as));

Review Comment:
   I think this needs to check that the offsets don't overflow a i32 - this 
should be a `try_into` I think and the method should be like `fn 
try_into_binary(self) -> Result<BinaryArray>` 



##########
arrow-row/src/lib.rs:
##########
@@ -868,6 +885,24 @@ impl Rows {
             + self.buffer.len()
             + self.offsets.len() * std::mem::size_of::<usize>()
     }
+
+    /// Create a [BinaryArray] from the [Rows] data without reallocating the
+    /// underlying bytes.
+    pub fn into_binary(self) -> BinaryArray {
+        assert!(

Review Comment:
   I agree `Binary` is likely the most common type. We could potentially add a 
`to_large_binary` to support converting to LargeBinary 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to