[GitHub] [arrow] bkietz commented on a change in pull request #10349: ARROW-12744: [C++][Compute] Add rounding kernel

GitBox Mon, 28 Jun 2021 09:39:21 -0700


bkietz commented on a change in pull request #10349:
URL: https://github.com/apache/arrow/pull/10349#discussion_r659917735




##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of 
both operands with
 enough scale kept. Error is returned if the result precision is beyond the
 decimal value range.
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s).  Integral input(s) produce floating-point output(s) of same 
value.
+If any of the input element(s) is null, the corresponding output element is 
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity      | Input types | Output type | Notes | Options 
class            |
++===============+============+=============+=============+==================================+
+| mround        | Unary      | Numeric     | Float32/64  | (1)(2) | 
:struct:`MRoundOptions` |
++---------------+------------+-------------+-------------+----------------------------------+
+| round         | Unary      | Numeric     | Float32/64  | (1)(3) | 
:struct:`RoundOptions`  |
++---------------+------------+-------------+-------------+----------------------------------+
+
+* \(1) Output value is a 64-bit floating-point for integral inputs and the
+  retains the same type for floating-point inputs.  By default rounding 
functions
+  displace a value to the nearest integer with a round to even for breaking 
ties.
+  Options are available to control the rounding behavior.
+* \(2) The ``multiple`` option specifies the rounding
+  scale and precision.  Only the magnitude of the ``rounding multiple`` is 
used,
+  its sign is ignored.
+* \(3) The ``ndigits`` option specifies the rounding precision in
+  terms of number of digits.  A negative value corresponds to digits in the
+  non-decimal part.
+
++-------------------------+---------------------------------+
+| Round mode              | Description/Examples            |
++=========================+=================================+
+| DOWNWARD                | Equivalent to ``floor(x)``      |
+| TOWARDS_NEG_INFINITY    | 3.7 = 3, -3.2 = -4              |

Review comment:
       Nit: using `=` like this is confusing
   ```suggestion
   | TOWARDS_NEG_INFINITY    | 3.7 -> 3, -3.2 -> -4            |
   ```

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -454,6 +456,166 @@ struct PowerChecked {
   }
 };
 
+struct RoundUtils {
+  template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> = 
true>
+  static bool ApproxEqual(const T x, const T y, const int ulp = 7) {
+    // https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon
+    // The machine epsilon has to be scaled to the magnitude of the values used
+    // and multiplied by the desired precision in ULPs (units in the last 
place)
+    const auto eps_ulp = std::numeric_limits<T>::epsilon() * ulp;
+    const auto xy_diff = std::fabs(x - y);
+    const auto xy_sum = std::fabs(x + y);
+    return (xy_diff <= (xy_sum * eps_ulp))
+           // unless the result is subnormal
+           || (xy_diff < std::numeric_limits<T>::min());
+  }
+
+  template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> = 
true>
+  static bool IsHalf(T val) {
+    // |frac| == 0.5?
+    return ApproxEqual(std::fabs(std::fmod(val, T(1))), T(0.5));
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Floor(T val) {
+    return std::floor(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Ceiling(T val) {
+    return std::ceil(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Truncate(T val) {
+    return std::trunc(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> TowardsInfinity(T val) {
+    return std::signbit(val) ? std::floor(val) : std::ceil(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfDown(T val) {
+    return std::ceil(val - T(0.5));
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfUp(T val) {
+    return std::floor(val + T(0.5));
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> HalfToEven(T val) {
+    if (IsHalf(val)) {
+      auto floor = std::floor(val);
+      // Odd + 1, Even + 0
+      return floor + (std::fmod(std::fabs(floor), T(2)) >= T(1));
+    }
+    return std::round(val);
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> HalfToOdd(T val) {
+    if (IsHalf(val)) {
+      auto floor = std::floor(val);
+      // Odd + 0, Even + 1
+      return floor + (std::fmod(std::fabs(floor), T(2)) < T(1));
+    }
+    return std::round(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Nearest(T val) {
+    return std::round(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfTowardsZero(T val) {
+    return std::copysign(std::ceil(std::fabs(val) - T(0.5)), val);
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> Round(T val, T mult, RoundMode round_mode,
+                                           Status* st) {
+    val /= mult;
+
+    T result;
+    switch (round_mode) {

Review comment:
       Let's ensure this switch is outside the hot loop. In this case, that 
would probably entail: make `round_mode` a template parameter of 
`RoundUtils::Round`, `struct Round`, and `struct MRound`. Then you can produce 
a vector of `ArrayKernelExec` which can be selected from outside the loop, 
something like:
   
   ```c++
     auto func =
         std::make_shared<ArithmeticUnaryFloatOnlyFunction>(name, 
Arity::Unary(), doc, default_options);
     for (const auto& ty : {float32(), float64()}) {
       std::vector<ArrayKernelExec> execs = {
         Round<RoundMode::DOWNWARD>,
         Round<RoundMode::UPWARD>,
         //...
       };
       auto exec = [execs](KernelContext* ctx, const ExecBatch& batch, Datum* 
out) {
         RoundMode round_mode = 
OptionsWrapper<RoundOptions>::Get(ctx).round_mode;
         return execs[round_mode](ctx, batch, out);
       };
       DCHECK_OK(func->AddKernel({ty}, out_ty, exec, init));
     }
     return func;
   ```

##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of 
both operands with
 enough scale kept. Error is returned if the result precision is beyond the
 decimal value range.
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s).  Integral input(s) produce floating-point output(s) of same 
value.
+If any of the input element(s) is null, the corresponding output element is 
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity      | Input types | Output type | Notes | Options 
class            |
++===============+============+=============+=============+==================================+
+| mround        | Unary      | Numeric     | Float32/64  | (1)(2) | 
:struct:`MRoundOptions` |
++---------------+------------+-------------+-------------+----------------------------------+
+| round         | Unary      | Numeric     | Float32/64  | (1)(3) | 
:struct:`RoundOptions`  |
++---------------+------------+-------------+-------------+----------------------------------+
+
+* \(1) Output value is a 64-bit floating-point for integral inputs and the
+  retains the same type for floating-point inputs.  By default rounding 
functions
+  displace a value to the nearest integer with a round to even for breaking 
ties.
+  Options are available to control the rounding behavior.
+* \(2) The ``multiple`` option specifies the rounding
+  scale and precision.  Only the magnitude of the ``rounding multiple`` is 
used,

Review comment:
       ```suggestion
     scale and precision.  Only the absolute value of the ``rounding multiple`` 
is used,
   ```

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -485,6 +647,36 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId 
get_id) {
   }
 }
 
+template <template <typename... Args> class KernelGenerator, typename Op>
+ArrayKernelExec GenerateArithmeticWithFloatOutType(detail::GetTypeId get_id) {

Review comment:
       Instead, wouldn't it make more sense to just insert an implicit cast to 
floating point?

##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of 
both operands with
 enough scale kept. Error is returned if the result precision is beyond the
 decimal value range.
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s).  Integral input(s) produce floating-point output(s) of same 
value.
+If any of the input element(s) is null, the corresponding output element is 
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity      | Input types | Output type | Notes | Options 
class            |

Review comment:
       ```suggestion
   | Function name | Arity      | Input types | Output type | Notes  | Options 
class           |
   ```

##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -42,6 +42,55 @@ struct ArithmeticOptions : public FunctionOptions {
   bool check_overflow;
 };
 
+enum class RoundMode {
+  // floor (towards negative infinity)
+  DOWNWARD,
+  TOWARDS_NEG_INFINITY,

Review comment:
       If these are equivalent, shouldn't they have the same value?
   ```suggestion
     DOWNWARD,
     TOWARDS_NEG_INFINITY = DOWNWARD,
   ```

##########
File path: docs/source/cpp/compute.rst
##########
@@ -286,7 +287,7 @@ an ``Invalid`` :class:`Status` when overflow is detected.
 
+--------------------------+------------+--------------------+---------------------+
 | power_checked            | Binary     | Numeric            | Numeric         
    |
 
+--------------------------+------------+--------------------+---------------------+
-| subtract                 | Binary     | Numeric            | Numeric (1)     
    |
+| subtract                 | Binary     | Numeric            | Numeric         
    |

Review comment:
       Was this intentional?

##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of 
both operands with
 enough scale kept. Error is returned if the result precision is beyond the
 decimal value range.
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s).  Integral input(s) produce floating-point output(s) of same 
value.
+If any of the input element(s) is null, the corresponding output element is 
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity      | Input types | Output type | Notes | Options 
class            |
++===============+============+=============+=============+==================================+
+| mround        | Unary      | Numeric     | Float32/64  | (1)(2) | 
:struct:`MRoundOptions` |
++---------------+------------+-------------+-------------+----------------------------------+
+| round         | Unary      | Numeric     | Float32/64  | (1)(3) | 
:struct:`RoundOptions`  |
++---------------+------------+-------------+-------------+----------------------------------+
+
+* \(1) Output value is a 64-bit floating-point for integral inputs and the
+  retains the same type for floating-point inputs.  By default rounding 
functions
+  displace a value to the nearest integer with a round to even for breaking 
ties.
+  Options are available to control the rounding behavior.
+* \(2) The ``multiple`` option specifies the rounding
+  scale and precision.  Only the magnitude of the ``rounding multiple`` is 
used,
+  its sign is ignored.
+* \(3) The ``ndigits`` option specifies the rounding precision in
+  terms of number of digits.  A negative value corresponds to digits in the
+  non-decimal part.
+
++-------------------------+---------------------------------+
+| Round mode              | Description/Examples            |
++=========================+=================================+
+| DOWNWARD                | Equivalent to ``floor(x)``      |
+| TOWARDS_NEG_INFINITY    | 3.7 = 3, -3.2 = -4              |
++-------------------------+---------------------------------+
+| UPWARD                  | Equivalent to ``ceil(x)``       |
+| TOWARDS_POS_INFINITY    | 3.2 = 4, -3.7 = -3              |
++-------------------------+---------------------------------+
+| TOWARDS_ZERO            | Equivalent to ``trunc(x)``      |
+| AWAY_FROM_INFINITY      | 3.7 = 3, -3.7 = -3              |
++-------------------------+---------------------------------+
+| TOWARDS_INFINITY        | 3.2 = 4, -3.2 = -4              |
+| AWAY_FROM_ZERO          |                                 |
++-------------------------+---------------------------------+
+| HALF_UP                 | 3.5 = 4, 4.5 = 5, -3.5 = -3     |
+| HALF_POS_INFINITY       |                                 |
++-------------------------+---------------------------------+
+| HALF_DOWN               | 3.5 = 3, 4.5 = 4, -3.5 = -4     |
+| HALF_NEG_INFINITY       |                                 |
++-------------------------+---------------------------------+
+| HALF_TO_EVEN            | 3.5 = 4, 4.5 = 4, -3.5 = -4     |
++-------------------------+---------------------------------+
+| HALF_TO_ODD             | 3.5 = 3, 4.5 = 5, -3.5 = -3     |
++-------------------------+---------------------------------+
+| HALF_TOWARDS_ZERO       | 3.5 = 3, 4.5 = 4, -3.5 = -3     |
+| HALF_AWAY_FROM_INFINITY |                                 |
++-------------------------+---------------------------------+
+| HALF_TOWARDS_INFINITY   | Round nearest integer           |
+| HALF_AWAY_FROM_ZERO     | 3.5 = 4, 4.5 = 5, -3.5 = -4     |
+| NEAREST                 |                                 |
++-------------------------+---------------------------------+
+
++----------------+---------------+----------------------------+
+| Round multiple | Round ndigits | Description                |
++================================+============================+
+| 1.0            | 0             | Round to integer           |

Review comment:
       Could you add some examples here too? Additionally, could you break this 
into two tables (one for `mround` and one for `round`)? From this table it 
looks like one might specify both `multiple` and `ndigits` to a single function 
call

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -20,6 +20,8 @@
 #include <limits>
 #include <utility>
 
+#include "arrow/compute/api_scalar.h"
+// #include "arrow/compute/kernels/codegen_internal.h"

Review comment:
       ```suggestion
   ```

##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of 
both operands with
 enough scale kept. Error is returned if the result precision is beyond the
 decimal value range.
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s).  Integral input(s) produce floating-point output(s) of same 
value.
+If any of the input element(s) is null, the corresponding output element is 
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity      | Input types | Output type | Notes | Options 
class            |
++===============+============+=============+=============+==================================+
+| mround        | Unary      | Numeric     | Float32/64  | (1)(2) | 
:struct:`MRoundOptions` |
++---------------+------------+-------------+-------------+----------------------------------+
+| round         | Unary      | Numeric     | Float32/64  | (1)(3) | 
:struct:`RoundOptions`  |
++---------------+------------+-------------+-------------+----------------------------------+
+
+* \(1) Output value is a 64-bit floating-point for integral inputs and the
+  retains the same type for floating-point inputs.  By default rounding 
functions
+  displace a value to the nearest integer with a round to even for breaking 
ties.
+  Options are available to control the rounding behavior.
+* \(2) The ``multiple`` option specifies the rounding
+  scale and precision.  Only the magnitude of the ``rounding multiple`` is 
used,
+  its sign is ignored.
+* \(3) The ``ndigits`` option specifies the rounding precision in
+  terms of number of digits.  A negative value corresponds to digits in the
+  non-decimal part.
+
++-------------------------+---------------------------------+
+| Round mode              | Description/Examples            |
++=========================+=================================+
+| DOWNWARD                | Equivalent to ``floor(x)``      |
+| TOWARDS_NEG_INFINITY    | 3.7 = 3, -3.2 = -4              |
++-------------------------+---------------------------------+
+| UPWARD                  | Equivalent to ``ceil(x)``       |
+| TOWARDS_POS_INFINITY    | 3.2 = 4, -3.7 = -3              |
++-------------------------+---------------------------------+
+| TOWARDS_ZERO            | Equivalent to ``trunc(x)``      |
+| AWAY_FROM_INFINITY      | 3.7 = 3, -3.7 = -3              |
++-------------------------+---------------------------------+
+| TOWARDS_INFINITY        | 3.2 = 4, -3.2 = -4              |
+| AWAY_FROM_ZERO          |                                 |
++-------------------------+---------------------------------+
+| HALF_UP                 | 3.5 = 4, 4.5 = 5, -3.5 = -3     |
+| HALF_POS_INFINITY       |                                 |
++-------------------------+---------------------------------+
+| HALF_DOWN               | 3.5 = 3, 4.5 = 4, -3.5 = -4     |
+| HALF_NEG_INFINITY       |                                 |
++-------------------------+---------------------------------+
+| HALF_TO_EVEN            | 3.5 = 4, 4.5 = 4, -3.5 = -4     |
++-------------------------+---------------------------------+
+| HALF_TO_ODD             | 3.5 = 3, 4.5 = 5, -3.5 = -3     |
++-------------------------+---------------------------------+
+| HALF_TOWARDS_ZERO       | 3.5 = 3, 4.5 = 4, -3.5 = -3     |
+| HALF_AWAY_FROM_INFINITY |                                 |
++-------------------------+---------------------------------+
+| HALF_TOWARDS_INFINITY   | Round nearest integer           |
+| HALF_AWAY_FROM_ZERO     | 3.5 = 4, 4.5 = 5, -3.5 = -4     |
+| NEAREST                 |                                 |
++-------------------------+---------------------------------+
+
++----------------+---------------+----------------------------+
+| Round multiple | Round ndigits | Description                |
++================================+============================+
+| 1.0            | 0             | Round to integer           |
++----------------+--------------------------------------------+
+| 0.001          | 3             | Round to 3 decimal places  |
++----------------+--------------------------------------------+
+| 10             | -2            | Round to multiple of 10    |

Review comment:
       ```suggestion
   | 10             | -2            | Round to multiple of 100   |
   ```

##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of 
both operands with
 enough scale kept. Error is returned if the result precision is beyond the
 decimal value range.
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s).  Integral input(s) produce floating-point output(s) of same 
value.
+If any of the input element(s) is null, the corresponding output element is 
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity      | Input types | Output type | Notes | Options 
class            |
++===============+============+=============+=============+==================================+
+| mround        | Unary      | Numeric     | Float32/64  | (1)(2) | 
:struct:`MRoundOptions` |
++---------------+------------+-------------+-------------+----------------------------------+
+| round         | Unary      | Numeric     | Float32/64  | (1)(3) | 
:struct:`RoundOptions`  |
++---------------+------------+-------------+-------------+----------------------------------+
+
+* \(1) Output value is a 64-bit floating-point for integral inputs and the
+  retains the same type for floating-point inputs.  By default rounding 
functions
+  displace a value to the nearest integer with a round to even for breaking 
ties.
+  Options are available to control the rounding behavior.
+* \(2) The ``multiple`` option specifies the rounding
+  scale and precision.  Only the magnitude of the ``rounding multiple`` is 
used,
+  its sign is ignored.
+* \(3) The ``ndigits`` option specifies the rounding precision in
+  terms of number of digits.  A negative value corresponds to digits in the
+  non-decimal part.

Review comment:
       ```suggestion
     non-fractional part. For example -2 corresponds to rounding to the nearest 
multiple of 100
     (zeroing the ones and tens digits).
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] bkietz commented on a change in pull request #10349: ARROW-12744: [C++][Compute] Add rounding kernel

Reply via email to