[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34834: GH-33985: [C++] Add substrait serialization/deserialization for expressions

via GitHub Fri, 07 Jul 2023 08:33:00 -0700


jorisvandenbossche commented on code in PR #34834:
URL: https://github.com/apache/arrow/pull/34834#discussion_r1256013840



##########
python/pyarrow/tests/test_compute.py:
##########
@@ -3277,19 +3281,106 @@ def test_expression_serialization():
     f = pc.scalar({'a': 1})
     g = pc.scalar(pa.scalar(1))
     h = pc.scalar(np.int64(2))
+    j = pc.scalar(False)
+
+    literal_exprs = [a, b, c, d, e, g, h, j]
+
+    exprs_with_call = [a == b, a != b, a > b, c & j, c | j, ~c, d.is_valid(),
+                       a + b, a - b, a * b, a / b, pc.negate(a),
+                       pc.add(a, b), pc.subtract(a, b), pc.divide(a, b),
+                       pc.multiply(a, b), pc.power(a, a), pc.sqrt(a),
+                       pc.exp(b), pc.cos(b), pc.sin(b), pc.tan(b),
+                       pc.acos(b), pc.atan(b), pc.asin(b), pc.atan2(b, b),
+                       pc.abs(b), pc.sign(a), pc.bit_wise_not(a),
+                       pc.bit_wise_and(a, a), pc.bit_wise_or(a, a),
+                       pc.bit_wise_xor(a, a), pc.is_nan(b), pc.is_finite(b),
+                       pc.coalesce(a, b),
+                       a.cast(pa.int32(), safe=False)]
+
+    exprs_with_ref = [pc.field('i64') > 5, pc.field('i64') == 5,
+                      pc.field('i64') == 7,
+                      pc.field(('foo', 'bar')) == 'value',
+                      pc.field('foo', 'bar') == 'value']
+
+    special_cases = [
+        f,  # Struct literals lose their field names
+        a.isin([1, 2, 3]),  # isin converts to an or list
+        pc.field('i64').is_null()  # pyarrow always specifies a FunctionOptions
+        # for is_null which, being the default, is
+        # dropped on serialization
+    ]
 
-    all_exprs = [a, b, c, d, e, f, g, h, a == b, a > b, a & b, a | b, ~c,
-                 d.is_valid(), a.cast(pa.int32(), safe=False),
-                 a.cast(pa.int32(), safe=False), a.isin([1, 2, 3]),
-                 pc.field('i64') > 5, pc.field('i64') == 5,
-                 pc.field('i64') == 7, pc.field('i64').is_null(),
-                 pc.field(('foo', 'bar')) == 'value',
-                 pc.field('foo', 'bar') == 'value']
+    all_exprs = literal_exprs + exprs_with_call + exprs_with_ref + 
special_cases
     for expr in all_exprs:
         assert isinstance(expr, pc.Expression)
         restored = pickle.loads(pickle.dumps(expr))
         assert expr.equals(restored)
 
+    if pas is not None:
+
+        test_schema = pa.schema([pa.field("i64", pa.int64()), pa.field(
+            "foo", pa.struct([pa.field("bar", pa.string())]))])
+
+        # Basic literals don't change on binding and so they will round
+        # trip without any change
+        for expr in literal_exprs:
+            serialized = expr.to_substrait(test_schema)
+            deserialized = pc.Expression.from_substrait(serialized)
+            assert expr.equals(deserialized)
+
+        # Expressions are bound when they get serialized.  Since bound
+        # expressions are not equal to their unbound variants we cannot
+        # compare the round tripped with the original
+        for expr in exprs_with_call:
+            serialized = expr.to_substrait(test_schema)
+            deserialized = pc.Expression.from_substrait(serialized)
+            # We can't compare the expressions themselves because of the bound
+            # unbound difference. But we can compare the string representation

Review Comment:
   I see, thanks for the explanation! I hadn't considered that the function vs 
kernel distinction



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34834: GH-33985: [C++] Add substrait serialization/deserialization for expressions

Reply via email to