lidavidm commented on a change in pull request #11882:
URL: https://github.com/apache/arrow/pull/11882#discussion_r782543512
##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -1350,5 +1371,24 @@ ARROW_EXPORT Result<Datum> AssumeTimezone(const Datum&
values,
AssumeTimezoneOptions options,
ExecContext* ctx = NULLPTR);
+/// \brief Between compares each element in `values`
+/// with `left` as a lower bound and 'right' as an upper bound
+///
+/// \param[in] values input to compare between left and right
+/// \param[in] left used as the lower bound for comparison
+/// \param[in] right used as the upper bound for comparison
+/// \param[in] options for bounds, default is inclusive of both, other
Review comment:
nit: was this meant to say something like `inclusive of both endpoints`?
##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1262,6 +1263,300 @@ def test_filter_null_type():
assert len(table.filter(mask).column(0)) == 5
[email protected]("ty", ["inclusive"])
+def test_between_array_array_array(ty):
+ BetweenOptions = partial(pc.BetweenOptions)
+
+ val = pa.array([1, 1, 4, 3, 2, 6])
+ arr1 = pa.array([1, 1, 3, 4, None, 5])
+ arr2 = pa.array([1, 2, 4, None, 4, 7])
+
+ inclusive_and_expected = {
+ "both": [True, True, True, None, None, True],
+ "left": [False, True, False, None, None, True],
+ "right": [False, False, True, None, None, True],
+ "neither": [False, False, False, None, None, True],
+ }
+
+ for inclusive, expected in inclusive_and_expected.items():
+ options = BetweenOptions(inclusive=inclusive)
+ result = pc.between(val, arr1, arr2, options=options)
+ np.testing.assert_array_equal(result, pa.array(expected))
Review comment:
Is there perhaps a way to leverage Pandas or NumPy as a reference point
to combine some of these test cases?
##########
File path: docs/source/python/api/compute.rst
##########
@@ -158,6 +158,7 @@ they return ``null``.
less
less_equal
not_equal
+ between
Review comment:
nit: keep in sorted order?
##########
File path: cpp/src/arrow/compute/kernels/scalar_compare_benchmark.cc
##########
@@ -77,5 +77,44 @@ BENCHMARK(GreaterArrayScalarInt64)->Apply(RegressionSetArgs);
BENCHMARK(GreaterArrayArrayString)->Apply(RegressionSetArgs);
BENCHMARK(GreaterArrayScalarString)->Apply(RegressionSetArgs);
+template <typename Type>
+static void BetweenScalarArrayScalar(benchmark::State& state) {
+ RegressionArgs args(state, /*size_is_bytes=*/false);
+ auto ty = TypeTraits<Type>::type_singleton();
+ auto rand = random::RandomArrayGenerator(kSeed);
+ auto array = rand.ArrayOf(ty, args.size, args.null_proportion);
+ auto scalar_left = *rand.ArrayOf(ty, 1, 0)->GetScalar(0);
+ auto scalar_right = *rand.ArrayOf(ty, 1, 0)->GetScalar(0);
+ for (auto _ : state) {
+ ABORT_NOT_OK(
+ CallFunction("between_less_equal_less_equal", {array, scalar_left,
scalar_right})
+ .status());
+ }
+}
+
+template <typename Type>
+static void BetweenArrayArrayArray(benchmark::State& state) {
+ RegressionArgs args(state, /*size_is_bytes=*/false);
+ auto ty = TypeTraits<Type>::type_singleton();
+ auto rand = random::RandomArrayGenerator(kSeed);
+ auto lhs = rand.ArrayOf(ty, args.size, args.null_proportion);
+ auto mid = rand.ArrayOf(ty, args.size, args.null_proportion);
+ auto rhs = rand.ArrayOf(ty, args.size, args.null_proportion);
+ for (auto _ : state) {
+ ABORT_NOT_OK(CallFunction("between_less_equal_less_equal", {mid, lhs,
rhs}).status());
+ }
+}
+
+// static void BetweenArrayArrayArrayInt64(benchmark::State& state) {
+// BetweenArrayArrayArray<Int64Type>(state);
+// }
+//
+// static void BetweenScalarArrayScalarInt64(benchmark::State& state) {
+// BetweenScalarArrayScalar<Int64Type>(state);
+// }
+
+// BENCHMARK(BetweenArrayArrayArrayInt64)->Apply(RegressionSetArgs);
+// BENCHMARK(BetweenScalarArrayScalarInt64)->Apply(RegressionSetArgs);
Review comment:
nit: commented-out code, was this meant to be removed or expanded into a
full benchmark?
##########
File path: cpp/src/arrow/compute/kernels/scalar_compare_benchmark.cc
##########
@@ -77,5 +77,44 @@ BENCHMARK(GreaterArrayScalarInt64)->Apply(RegressionSetArgs);
BENCHMARK(GreaterArrayArrayString)->Apply(RegressionSetArgs);
BENCHMARK(GreaterArrayScalarString)->Apply(RegressionSetArgs);
+template <typename Type>
+static void BetweenScalarArrayScalar(benchmark::State& state) {
+ RegressionArgs args(state, /*size_is_bytes=*/false);
+ auto ty = TypeTraits<Type>::type_singleton();
+ auto rand = random::RandomArrayGenerator(kSeed);
+ auto array = rand.ArrayOf(ty, args.size, args.null_proportion);
+ auto scalar_left = *rand.ArrayOf(ty, 1, 0)->GetScalar(0);
+ auto scalar_right = *rand.ArrayOf(ty, 1, 0)->GetScalar(0);
+ for (auto _ : state) {
+ ABORT_NOT_OK(
+ CallFunction("between_less_equal_less_equal", {array, scalar_left,
scalar_right})
+ .status());
+ }
+}
+
+template <typename Type>
+static void BetweenArrayArrayArray(benchmark::State& state) {
+ RegressionArgs args(state, /*size_is_bytes=*/false);
+ auto ty = TypeTraits<Type>::type_singleton();
+ auto rand = random::RandomArrayGenerator(kSeed);
+ auto lhs = rand.ArrayOf(ty, args.size, args.null_proportion);
+ auto mid = rand.ArrayOf(ty, args.size, args.null_proportion);
+ auto rhs = rand.ArrayOf(ty, args.size, args.null_proportion);
+ for (auto _ : state) {
+ ABORT_NOT_OK(CallFunction("between_less_equal_less_equal", {mid, lhs,
rhs}).status());
+ }
+}
+
+// static void BetweenArrayArrayArrayInt64(benchmark::State& state) {
+// BetweenArrayArrayArray<Int64Type>(state);
+// }
+//
+// static void BetweenScalarArrayScalarInt64(benchmark::State& state) {
+// BetweenScalarArrayScalar<Int64Type>(state);
+// }
+
+// BENCHMARK(BetweenArrayArrayArrayInt64)->Apply(RegressionSetArgs);
+// BENCHMARK(BetweenScalarArrayScalarInt64)->Apply(RegressionSetArgs);
Review comment:
It seems the benchmark definitions above are otherwise unused.
##########
File path: docs/source/cpp/compute.rst
##########
@@ -664,35 +664,43 @@ Decimal values are accepted, but are cast to Float64
first.
Comparisons
~~~~~~~~~~~
-These functions expect two inputs of numeric type (in which case they will be
+These functions expect two or three inputs of numeric type (in which case they
will be
cast to the :ref:`common numeric type <common-numeric-type>` before
comparison),
-or two inputs of Binary- or String-like types, or two inputs of Temporal types.
-If any input is dictionary encoded it will be expanded for the purposes of
-comparison. If any of the input elements in a pair is null, the corresponding
+or two or three inputs of Binary- or String-like types, or two or three inputs
of Temporal
+types. If any input is dictionary encoded it will be expanded for the purposes
of
+comparison. If any of the input elements in a pair or triple is null, the
corresponding
output element is null. Decimal arguments will be promoted in the same way as
for ``add`` and ``subtract``.
-+----------------+------------+---------------------------------------------+---------------------+
-| Function names | Arity | Input types |
Output type |
-+================+============+=============================================+=====================+
-| equal | Binary | Numeric, Temporal, Binary- and String-like |
Boolean |
-+----------------+------------+---------------------------------------------+---------------------+
-| greater | Binary | Numeric, Temporal, Binary- and String-like |
Boolean |
-+----------------+------------+---------------------------------------------+---------------------+
-| greater_equal | Binary | Numeric, Temporal, Binary- and String-like |
Boolean |
-+----------------+------------+---------------------------------------------+---------------------+
-| less | Binary | Numeric, Temporal, Binary- and String-like |
Boolean |
-+----------------+------------+---------------------------------------------+---------------------+
-| less_equal | Binary | Numeric, Temporal, Binary- and String-like |
Boolean |
-+----------------+------------+---------------------------------------------+---------------------+
-| not_equal | Binary | Numeric, Temporal, Binary- and String-like |
Boolean |
-+----------------+------------+---------------------------------------------+---------------------+
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| Function names | Arity | Input types |
Output type | Options Class | Notes |
++================+=========+=============================================+=============+==========================+=======+
+| equal | Binary | Numeric, Temporal, Binary- and String-like |
Boolean | | |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| greater | Binary | Numeric, Temporal, Binary- and String-like |
Boolean | | |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| greater_equal | Binary | Numeric, Temporal, Binary- and String-like |
Boolean | | |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| less | Binary | Numeric, Temporal, Binary- and String-like |
Boolean | | |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| less_equal | Binary | Numeric, Temporal, Binary- and String-like |
Boolean | | |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| not_equal | Binary | Numeric, Temporal, Binary- and String-like |
Boolean | | |
++----------------+---------+---------------------------------------------+-------------+--------------------------+-------+
+| between | Ternary | Numeric, Temporal, Binary- and String-like |
Boolean | :struct:`BetweenOptions` | \(1) |
Review comment:
nit: keep in sorted order?
##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -1350,5 +1371,24 @@ ARROW_EXPORT Result<Datum> AssumeTimezone(const Datum&
values,
AssumeTimezoneOptions options,
ExecContext* ctx = NULLPTR);
+/// \brief Between compares each element in `values`
+/// with `left` as a lower bound and 'right' as an upper bound
+///
+/// \param[in] values input to compare between left and right
+/// \param[in] left used as the lower bound for comparison
+/// \param[in] right used as the upper bound for comparison
+/// \param[in] options for bounds, default is inclusive of both, other
+/// endpoints, other choices are left (exclude left endpoint), right
+/// (exclude right endpoint) and both (exclude both endpoints), optional
+/// \param[in] ctx the function execution context, optional
+///
+/// \return the resulting datum
Review comment:
Also `\since 7.0.0` (optimistically :slightly_smiling_face:)?
##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -316,6 +316,21 @@ struct ARROW_EXPORT CompareOptions {
enum CompareOperator op;
};
+enum class BetweenMode : int8_t {
+ LESS_EQUAL_LESS_EQUAL,
+ LESS_EQUAL_LESS,
+ LESS_LESS_EQUAL,
+ LESS_LESS,
+};
Review comment:
Hmm, @edponce is this structure what you expected? I think for
consistency with other options it would look like this:
```cpp
class BetweenOptions {
public:
enum Inclusive {
BOTH,
// ...
};
// ...
};
```
...where `enum` is deliberately used over `enum class` so that the enum
members can be accessed as `BetweenOptions::BOTH`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]