felipecrv commented on code in PR #507:
URL: https://github.com/apache/arrow-nanoarrow/pull/507#discussion_r1626817305


##########
src/nanoarrow/array.c:
##########
@@ -1163,6 +1214,40 @@ static int ArrowArrayViewValidateFull(struct 
ArrowArrayView* array_view,
     }
   }
 
+  if (array_view->storage_type == NANOARROW_TYPE_RUN_END_ENCODED) {
+    struct ArrowArrayView* run_ends_view = array_view->children[0];
+    switch (run_ends_view->storage_type) {
+      case NANOARROW_TYPE_INT16:
+      case NANOARROW_TYPE_INT32:
+      case NANOARROW_TYPE_INT64:
+        break;
+      default:
+        ArrowErrorSet(
+            error,
+            "Run-end encoded array only supports INT16, INT32 or INT64 
run-ends "
+            "but found run-ends type %s",
+            ArrowTypeString(run_ends_view->storage_type));
+        return EINVAL;
+    }
+    int64_t prev_run_end = 0;
+    for (int64_t i = 0; i < run_ends_view->length; i++) {
+      int64_t run_end = ArrowArrayViewGetIntUnsafe(run_ends_view, i);
+      if (run_end < 0 || run_end < prev_run_end) {
+        return EINVAL;
+      }
+      prev_run_end = run_end;
+    }
+    if (prev_run_end != array_view->length) {
+      ArrowErrorSet(error,
+                    "The last run end value of a run-end encoded array must be 
equal to "
+                    "the logical length"

Review Comment:
   Nope. :-) And this is what make run-end encoded arrays so mind-bending.
   
   Think of the children run-ends and values arrays as forming a virtual big 
array starting from zero and ending at the last run-end. Then the parent 
offset/length pair are just a projection onto this big array.
   
   The offset on the parent can land on any of the runs. And that offset plus 
the length can land on any of the runs and the run doesn't necessarily end when 
the parent's offset+length says.
   
   For example, in this REE
   
   ```
   offset: 1
   length: 3
     run_ends: [2, 5, 6]
     values:   [A, B, C]
   ```
   
   the uncompressed "big array" is `[A A B B B C]` but the entire REE is `[A B 
B]`.
   
   Both the first run and the last are cut by `offset` and then `offset+length`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to