fresh-borzoni commented on code in PR #365:
URL: https://github.com/apache/fluss-rust/pull/365#discussion_r2840898915


##########
crates/fluss/src/row/mod.rs:
##########
@@ -63,58 +65,58 @@ pub trait InternalRow: Send + Sync {
     fn is_null_at(&self, pos: usize) -> bool;
 
     /// Returns the boolean value at the given position
-    fn get_boolean(&self, pos: usize) -> bool;
+    fn get_boolean(&self, pos: usize) -> Result<bool>;

Review Comment:
   I don't actually see the benefit for this, so we save on `?`, but at the 
same time here it's not the same as in rust-postgeres, as in there you can 
always trust the row - `Row` is a single concrete type with always-trusted 
server data. 
   
   in fluss-rust, users input constructs GenericRow- so it's susceptible to 
panics, so it makes sense to return Result for this and not have mirror surface.



##########
crates/fluss/src/row/column.rs:
##########
@@ -187,102 +190,130 @@ impl InternalRow for ColumnarRow {
         self.record_batch.column(pos).is_null(self.row_id)
     }
 
-    fn get_boolean(&self, pos: usize) -> bool {
-        self.record_batch
+    fn get_boolean(&self, pos: usize) -> Result<bool> {
+        Ok(self
+            .record_batch
             .column(pos)
             .as_boolean()

Review Comment:
   `as_boolean_opt()` with `.ok_or_else` following the pattern as other getters?



##########
crates/fluss/src/row/mod.rs:
##########
@@ -127,98 +129,149 @@ pub struct GenericRow<'a> {
     pub values: Vec<Datum<'a>>,
 }
 
+impl<'a> GenericRow<'a> {
+    fn get_value(&self, pos: usize) -> Result<&Datum<'a>> {
+        self.values.get(pos).ok_or_else(|| IllegalArgument {
+            message: format!(
+                "position {pos} out of bounds (row has {} fields)",
+                self.values.len()
+            ),
+        })
+    }
+
+    fn try_convert<T: TryFrom<&'a Datum<'a>>>(
+        &'a self,
+        pos: usize,
+        expected_type: &str,
+    ) -> Result<T> {
+        let datum = self.get_value(pos)?;
+        T::try_from(datum).map_err(|_| IllegalArgument {
+            message: format!(
+                "type mismatch at position {pos}: expected {expected_type}, 
got {datum:?}"
+            ),
+        })
+    }
+}
+
 impl<'a> InternalRow for GenericRow<'a> {
     fn get_field_count(&self) -> usize {
         self.values.len()
     }
 
     fn is_null_at(&self, pos: usize) -> bool {
-        self.values
-            .get(pos)
-            .expect("position out of bounds")
-            .is_null()
+        self.values.get(pos).is_none_or(|v| v.is_null())

Review Comment:
   +1 as the result of this change we would have inconsistent behavious across 
different implementations of InternalRow:
     - GenericRow::is_null_at(OOB) -> true
     - ColumnarRow::is_null_at(OOB) -> panics (Arrow's column(pos) panics)
     - CompactedRow::is_null_at(OOB) -> panics (slice indexing fields()[pos])
     
     and this issue, we basically produce Null for OOB:
     ```rust
     impl FieldGetter {
       pub fn get_field<'a>(&self, row: &'a dyn InternalRow) -> Datum<'a> {
           match self {
               FieldGetter::Nullable(getter) => {
                   if row.is_null_at(getter.pos()) {
                       Datum::Null
                   } else {
                       getter.get_field(row)
                   }
               }
               FieldGetter::NonNullable(getter) => getter.get_field(row),
           }
       }
   ```
   While `check_field_count` guards the `append()/upsert()` paths, is_null_at 
is a public trait method. Users calling it directly get a lie instead of an 
error.
   
   I would make get_ to return `Result`, but is_null_at leave as strict - it's 
programming error, so it's fine to panic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to