[GitHub] [arrow] lidavidm commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

GitBox Tue, 11 Jan 2022 13:46:39 -0800


lidavidm commented on a change in pull request #11882:
URL: https://github.com/apache/arrow/pull/11882#discussion_r782543512




##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -1350,5 +1371,24 @@ ARROW_EXPORT Result<Datum> AssumeTimezone(const Datum& 
values,
                                           AssumeTimezoneOptions options,
                                           ExecContext* ctx = NULLPTR);
 
+/// \brief Between compares each element in `values`
+/// with `left` as a lower bound and 'right' as an upper bound
+///
+/// \param[in] values input to compare between left and right
+/// \param[in] left used as the lower bound for comparison
+/// \param[in] right used as the upper bound for comparison
+/// \param[in] options for bounds, default is inclusive of both, other

Review comment:
       nit: was this meant to say something like `inclusive of both endpoints`?

##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1262,6 +1263,300 @@ def test_filter_null_type():
     assert len(table.filter(mask).column(0)) == 5
 
 
[email protected]("ty", ["inclusive"])
+def test_between_array_array_array(ty):
+    BetweenOptions = partial(pc.BetweenOptions)
+
+    val = pa.array([1, 1, 4, 3, 2, 6])
+    arr1 = pa.array([1, 1, 3, 4, None, 5])
+    arr2 = pa.array([1, 2, 4, None, 4, 7])
+
+    inclusive_and_expected = {
+        "both": [True, True, True, None, None, True],
+        "left": [False, True, False, None, None, True],
+        "right": [False, False, True, None, None, True],
+        "neither": [False, False, False, None, None, True],
+    }
+
+    for inclusive, expected in inclusive_and_expected.items():
+        options = BetweenOptions(inclusive=inclusive)
+        result = pc.between(val, arr1, arr2, options=options)
+        np.testing.assert_array_equal(result, pa.array(expected))

Review comment:
       Is there perhaps a way to leverage Pandas or NumPy as a reference point 
to combine some of these test cases?

##########
File path: docs/source/python/api/compute.rst
##########
@@ -158,6 +158,7 @@ they return ``null``.
    less
    less_equal
    not_equal
+   between

Review comment:
       nit: keep in sorted order?

##########
File path: cpp/src/arrow/compute/kernels/scalar_compare_benchmark.cc
##########
@@ -77,5 +77,44 @@ BENCHMARK(GreaterArrayScalarInt64)->Apply(RegressionSetArgs);
 BENCHMARK(GreaterArrayArrayString)->Apply(RegressionSetArgs);
 BENCHMARK(GreaterArrayScalarString)->Apply(RegressionSetArgs);
 
+template <typename Type>
+static void BetweenScalarArrayScalar(benchmark::State& state) {
+  RegressionArgs args(state, /*size_is_bytes=*/false);
+  auto ty = TypeTraits<Type>::type_singleton();
+  auto rand = random::RandomArrayGenerator(kSeed);
+  auto array = rand.ArrayOf(ty, args.size, args.null_proportion);
+  auto scalar_left = *rand.ArrayOf(ty, 1, 0)->GetScalar(0);
+  auto scalar_right = *rand.ArrayOf(ty, 1, 0)->GetScalar(0);
+  for (auto _ : state) {
+    ABORT_NOT_OK(
+        CallFunction("between_less_equal_less_equal", {array, scalar_left, 
scalar_right})
+            .status());
+  }
+}
+
+template <typename Type>
+static void BetweenArrayArrayArray(benchmark::State& state) {
+  RegressionArgs args(state, /*size_is_bytes=*/false);
+  auto ty = TypeTraits<Type>::type_singleton();
+  auto rand = random::RandomArrayGenerator(kSeed);
+  auto lhs = rand.ArrayOf(ty, args.size, args.null_proportion);
+  auto mid = rand.ArrayOf(ty, args.size, args.null_proportion);
+  auto rhs = rand.ArrayOf(ty, args.size, args.null_proportion);
+  for (auto _ : state) {
+    ABORT_NOT_OK(CallFunction("between_less_equal_less_equal", {mid, lhs, 
rhs}).status());
+  }
+}
+
+// static void BetweenArrayArrayArrayInt64(benchmark::State& state) {
+//   BetweenArrayArrayArray<Int64Type>(state);
+// }
+//
+// static void BetweenScalarArrayScalarInt64(benchmark::State& state) {
+//   BetweenScalarArrayScalar<Int64Type>(state);
+// }
+
+// BENCHMARK(BetweenArrayArrayArrayInt64)->Apply(RegressionSetArgs);
+// BENCHMARK(BetweenScalarArrayScalarInt64)->Apply(RegressionSetArgs);

Review comment:
       nit: commented-out code, was this meant to be removed or expanded into a 
full benchmark?

##########
File path: cpp/src/arrow/compute/kernels/scalar_compare_benchmark.cc
##########
@@ -77,5 +77,44 @@ BENCHMARK(GreaterArrayScalarInt64)->Apply(RegressionSetArgs);
 BENCHMARK(GreaterArrayArrayString)->Apply(RegressionSetArgs);
 BENCHMARK(GreaterArrayScalarString)->Apply(RegressionSetArgs);
 
+template <typename Type>
+static void BetweenScalarArrayScalar(benchmark::State& state) {
+  RegressionArgs args(state, /*size_is_bytes=*/false);
+  auto ty = TypeTraits<Type>::type_singleton();
+  auto rand = random::RandomArrayGenerator(kSeed);
+  auto array = rand.ArrayOf(ty, args.size, args.null_proportion);
+  auto scalar_left = *rand.ArrayOf(ty, 1, 0)->GetScalar(0);
+  auto scalar_right = *rand.ArrayOf(ty, 1, 0)->GetScalar(0);
+  for (auto _ : state) {
+    ABORT_NOT_OK(
+        CallFunction("between_less_equal_less_equal", {array, scalar_left, 
scalar_right})
+            .status());
+  }
+}
+
+template <typename Type>
+static void BetweenArrayArrayArray(benchmark::State& state) {
+  RegressionArgs args(state, /*size_is_bytes=*/false);
+  auto ty = TypeTraits<Type>::type_singleton();
+  auto rand = random::RandomArrayGenerator(kSeed);
+  auto lhs = rand.ArrayOf(ty, args.size, args.null_proportion);
+  auto mid = rand.ArrayOf(ty, args.size, args.null_proportion);
+  auto rhs = rand.ArrayOf(ty, args.size, args.null_proportion);
+  for (auto _ : state) {
+    ABORT_NOT_OK(CallFunction("between_less_equal_less_equal", {mid, lhs, 
rhs}).status());
+  }
+}
+
+// static void BetweenArrayArrayArrayInt64(benchmark::State& state) {
+//   BetweenArrayArrayArray<Int64Type>(state);
+// }
+//
+// static void BetweenScalarArrayScalarInt64(benchmark::State& state) {
+//   BetweenScalarArrayScalar<Int64Type>(state);
+// }
+
+// BENCHMARK(BetweenArrayArrayArrayInt64)->Apply(RegressionSetArgs);
+// BENCHMARK(BetweenScalarArrayScalarInt64)->Apply(RegressionSetArgs);

Review comment:
       It seems the benchmark definitions above are otherwise unused.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -664,35 +664,43 @@ Decimal values are accepted, but are cast to Float64 
first.
 Comparisons
 ~~~~~~~~~~~
 
-These functions expect two inputs of numeric type (in which case they will be
+These functions expect two or three inputs of numeric type (in which case they 
will be
 cast to the :ref:`common numeric type <common-numeric-type>` before 
comparison),
-or two inputs of Binary- or String-like types, or two inputs of Temporal types.
-If any input is dictionary encoded it will be expanded for the purposes of
-comparison. If any of the input elements in a pair is null, the corresponding
+or two or three inputs of Binary- or String-like types, or two or three inputs 
of Temporal 
+types. If any input is dictionary encoded it will be expanded for the purposes 
of
+comparison. If any of the input elements in a pair or triple is null, the 
corresponding
 output element is null. Decimal arguments will be promoted in the same way as
 for ``add`` and ``subtract``.
 
-+----------------+------------+---------------------------------------------+---------------------+
-| Function names | Arity      | Input types                                 | 
Output type         |
-+================+============+=============================================+=====================+
-| equal          | Binary     | Numeric, Temporal, Binary- and String-like  | 
Boolean             |
-+----------------+------------+---------------------------------------------+---------------------+
-| greater        | Binary     | Numeric, Temporal, Binary- and String-like  | 
Boolean             |
-+----------------+------------+---------------------------------------------+---------------------+
-| greater_equal  | Binary     | Numeric, Temporal, Binary- and String-like  | 
Boolean             |
-+----------------+------------+---------------------------------------------+---------------------+
-| less           | Binary     | Numeric, Temporal, Binary- and String-like  | 
Boolean             |
-+----------------+------------+---------------------------------------------+---------------------+
-| less_equal     | Binary     | Numeric, Temporal, Binary- and String-like  | 
Boolean             |
-+----------------+------------+---------------------------------------------+---------------------+
-| not_equal      | Binary     | Numeric, Temporal, Binary- and String-like  | 
Boolean             |
-+----------------+------------+---------------------------------------------+---------------------+
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| Function names | Arity   | Input types                                 | 
Output type | Options Class            | Notes |
++================+=========+=============================================+=============+==========================+=======+
+| equal          | Binary  | Numeric, Temporal, Binary- and String-like  | 
Boolean     |                          |       |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| greater        | Binary  | Numeric, Temporal, Binary- and String-like  | 
Boolean     |                          |       |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| greater_equal  | Binary  | Numeric, Temporal, Binary- and String-like  | 
Boolean     |                          |       |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| less           | Binary  | Numeric, Temporal, Binary- and String-like  | 
Boolean     |                          |       |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| less_equal     | Binary  | Numeric, Temporal, Binary- and String-like  | 
Boolean     |                          |       |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| not_equal      | Binary  | Numeric, Temporal, Binary- and String-like  | 
Boolean     |                          |       |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| between        | Ternary | Numeric, Temporal, Binary- and String-like  | 
Boolean     | :struct:`BetweenOptions` | \(1)  |

Review comment:
       nit: keep in sorted order?

##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -1350,5 +1371,24 @@ ARROW_EXPORT Result<Datum> AssumeTimezone(const Datum& 
values,
                                           AssumeTimezoneOptions options,
                                           ExecContext* ctx = NULLPTR);
 
+/// \brief Between compares each element in `values`
+/// with `left` as a lower bound and 'right' as an upper bound
+///
+/// \param[in] values input to compare between left and right
+/// \param[in] left used as the lower bound for comparison
+/// \param[in] right used as the upper bound for comparison
+/// \param[in] options for bounds, default is inclusive of both, other
+///  endpoints, other choices are left (exclude left endpoint), right
+///  (exclude right endpoint) and both (exclude both endpoints), optional
+/// \param[in] ctx the function execution context, optional
+///
+/// \return the resulting datum

Review comment:
       Also `\since 7.0.0` (optimistically :slightly_smiling_face:)?

##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -316,6 +316,21 @@ struct ARROW_EXPORT CompareOptions {
   enum CompareOperator op;
 };
 
+enum class BetweenMode : int8_t {
+  LESS_EQUAL_LESS_EQUAL,
+  LESS_EQUAL_LESS,
+  LESS_LESS_EQUAL,
+  LESS_LESS,
+};

Review comment:
       Hmm, @edponce is this structure what you expected? I think for 
consistency with other options it would look like this:
   
   ```cpp
   class BetweenOptions {
    public:
     enum Inclusive {
       BOTH,
       // ...
     };
     // ...
   };
   ```
   
   ...where `enum` is deliberately used over `enum class` so that the enum 
members can be accessed as `BetweenOptions::BOTH`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

Reply via email to