westonpace commented on code in PR #40696:
URL: https://github.com/apache/arrow/pull/40696#discussion_r1535618271


##########
format/substrait/extension_types.yaml:
##########
@@ -42,29 +42,48 @@
 # (but that is an infinite space). Similarly, we would have to declare a
 # timestamp variation for all possible timezone strings.
 
-type_variations:
-  - parent: i8
-    name: u8
-    description: an unsigned 8 bit integer
-    functions: SEPARATE
-  - parent: i16
-    name: u16
-    description: an unsigned 16 bit integer
-    functions: SEPARATE
-  - parent: i32
-    name: u32
-    description: an unsigned 32 bit integer
-    functions: SEPARATE
-  - parent: i64
-    name: u64
-    description: an unsigned 64 bit integer
-    functions: SEPARATE
+# Certain Arrow data types are, from Substrait's point of view, encodings.
+# These include dictionary, the view types (e.g. binary view, list view),
+# and REE.
+#
+# These types are not logically distinct from the type they are encoding.
+# Specifically:
+#  *  There is no value in the decoded type that cannot be represented
+#     in the encoded type and vice versa.

Review Comment:
   Well...if I'm being pedantic...let me expand on my definition:
   
   ```
   Let T1 and T2 be two types.
   There exist functions ENCODE and DECODE such that:
   For every value x in T1 the value DECODE(ENCODE(x)) is equal to x
   For every value y in T2 the value ENCODE(DECODE(y)) is equal to y
   
   AND ALSO
   
   For every function F in "the universe of compute functions" and for every 
value x in T1:
   
   F(T1) = DECODE(F(ENCODE(T1)))
   ```
   
   In other words, you can't "encode uint8 into int8 by deciding that 128 maps 
to -1 (etc.)" because the correct result of `add(127_i8, 1_i8)` is `ERROR` and 
if you applied the above encoding you would get `-1` instead of `ERROR`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to