alamb commented on code in PR #3174:
URL: https://github.com/apache/arrow-rs/pull/3174#discussion_r1030639438


##########
arrow/src/row/mod.rs:
##########
@@ -505,6 +518,43 @@ impl RowConverter {
     }
 }
 
+/// A [`RowParser`] can be created from a [`RowConverter`] and used to parse 
bytes to [`Row`]
+#[derive(Debug)]
+pub struct RowParser {
+    config: RowConfig,
+}
+
+impl RowParser {
+    fn new(fields: Arc<[SortField]>) -> Self {
+        Self {
+            config: RowConfig {
+                fields,
+                validate_utf8: true,
+            },
+        }
+    }
+
+    /// Creates a [`Row`] from the provided `bytes`.
+    ///
+    /// `bytes` must be a [`Row`] produced by the [`RowConverter`] associated 
with
+    /// this [`RowParser`], otherwise subsequent operations with the produced 
[`Row`] may panic
+    pub fn parse<'a>(&'a self, bytes: &'a [u8]) -> Row<'a> {
+        Row {
+            data: bytes,
+            config: &self.config,
+        }
+    }
+}
+
+/// The config of a given set of [`Row`]
+#[derive(Debug, Clone)]
+struct RowConfig {
+    /// The schema for these rows
+    fields: Arc<[SortField]>,
+    /// Whether to run UTF-8 validation when converting to arrow arrays

Review Comment:
   perhaps it would be wise to note here that utf8 validation will be required 
when reading bytes that may have been modified?
   
   



##########
arrow/src/row/mod.rs:
##########
@@ -465,14 +470,15 @@ impl RowConverter {
     where
         I: IntoIterator<Item = Row<'a>>,
     {
+        let mut validate_utf8 = false;
         let mut rows: Vec<_> = rows
             .into_iter()
             .map(|row| {
                 assert!(
-                    Arc::ptr_eq(row.fields, &self.fields),
+                    Arc::ptr_eq(&row.config.fields, &self.fields),
                     "rows were not produced by this RowConverter"
                 );
-
+                validate_utf8 |= row.config.validate_utf8;

Review Comment:
   This seems strange that some rows would have `validate_utf8` set and some 
would not if they came from the same row converter. Maybe we could assert they 
are all the same?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to