[GitHub] [arrow] edponce commented on a change in pull request #10349: ARROW-12744: [C++][Compute] Add rounding kernel

GitBox Tue, 29 Jun 2021 06:44:00 -0700


edponce commented on a change in pull request #10349:
URL: https://github.com/apache/arrow/pull/10349#discussion_r659957016




##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of 
both operands with
 enough scale kept. Error is returned if the result precision is beyond the
 decimal value range.
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s).  Integral input(s) produce floating-point output(s) of same 
value.
+If any of the input element(s) is null, the corresponding output element is 
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity      | Input types | Output type | Notes | Options 
class            |
++===============+============+=============+=============+==================================+
+| mround        | Unary      | Numeric     | Float32/64  | (1)(2) | 
:struct:`MRoundOptions` |
++---------------+------------+-------------+-------------+----------------------------------+
+| round         | Unary      | Numeric     | Float32/64  | (1)(3) | 
:struct:`RoundOptions`  |
++---------------+------------+-------------+-------------+----------------------------------+
+
+* \(1) Output value is a 64-bit floating-point for integral inputs and the
+  retains the same type for floating-point inputs.  By default rounding 
functions
+  displace a value to the nearest integer with a round to even for breaking 
ties.
+  Options are available to control the rounding behavior.
+* \(2) The ``multiple`` option specifies the rounding
+  scale and precision.  Only the magnitude of the ``rounding multiple`` is 
used,
+  its sign is ignored.
+* \(3) The ``ndigits`` option specifies the rounding precision in
+  terms of number of digits.  A negative value corresponds to digits in the
+  non-decimal part.
+
++-------------------------+---------------------------------+
+| Round mode              | Description/Examples            |
++=========================+=================================+
+| DOWNWARD                | Equivalent to ``floor(x)``      |
+| TOWARDS_NEG_INFINITY    | 3.7 = 3, -3.2 = -4              |

Review comment:
       Agree, good observation.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -286,7 +287,7 @@ an ``Invalid`` :class:`Status` when overflow is detected.
 
+--------------------------+------------+--------------------+---------------------+
 | power_checked            | Binary     | Numeric            | Numeric         
    |
 
+--------------------------+------------+--------------------+---------------------+
-| subtract                 | Binary     | Numeric            | Numeric (1)     
    |
+| subtract                 | Binary     | Numeric            | Numeric         
    |

Review comment:
       Not intentional at all. This was sloppy on my part.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of 
both operands with
 enough scale kept. Error is returned if the result precision is beyond the
 decimal value range.
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s).  Integral input(s) produce floating-point output(s) of same 
value.
+If any of the input element(s) is null, the corresponding output element is 
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity      | Input types | Output type | Notes | Options 
class            |

Review comment:
       Well, I was trying to mimick the [string 
transforms](https://arrow.apache.org/docs/cpp/compute.html#string-transforms) 
table, but noticed that there was a typo, so now *Notes* and *Options class* 
are different columns.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -454,6 +456,166 @@ struct PowerChecked {
   }
 };
 
+struct RoundUtils {
+  template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> = 
true>
+  static bool ApproxEqual(const T x, const T y, const int ulp = 7) {
+    // https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon
+    // The machine epsilon has to be scaled to the magnitude of the values used
+    // and multiplied by the desired precision in ULPs (units in the last 
place)
+    const auto eps_ulp = std::numeric_limits<T>::epsilon() * ulp;
+    const auto xy_diff = std::fabs(x - y);
+    const auto xy_sum = std::fabs(x + y);
+    return (xy_diff <= (xy_sum * eps_ulp))
+           // unless the result is subnormal
+           || (xy_diff < std::numeric_limits<T>::min());
+  }
+
+  template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> = 
true>
+  static bool IsHalf(T val) {
+    // |frac| == 0.5?
+    return ApproxEqual(std::fabs(std::fmod(val, T(1))), T(0.5));
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Floor(T val) {
+    return std::floor(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Ceiling(T val) {
+    return std::ceil(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Truncate(T val) {
+    return std::trunc(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> TowardsInfinity(T val) {
+    return std::signbit(val) ? std::floor(val) : std::ceil(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfDown(T val) {
+    return std::ceil(val - T(0.5));
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfUp(T val) {
+    return std::floor(val + T(0.5));
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> HalfToEven(T val) {
+    if (IsHalf(val)) {
+      auto floor = std::floor(val);
+      // Odd + 1, Even + 0
+      return floor + (std::fmod(std::fabs(floor), T(2)) >= T(1));
+    }
+    return std::round(val);
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> HalfToOdd(T val) {
+    if (IsHalf(val)) {
+      auto floor = std::floor(val);
+      // Odd + 0, Even + 1
+      return floor + (std::fmod(std::fabs(floor), T(2)) < T(1));
+    }
+    return std::round(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Nearest(T val) {
+    return std::round(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfTowardsZero(T val) {
+    return std::copysign(std::ceil(std::fabs(val) - T(0.5)), val);
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> Round(T val, T mult, RoundMode round_mode,
+                                           Status* st) {
+    val /= mult;
+
+    T result;
+    switch (round_mode) {

Review comment:
       Actually this was something that I thought about but did not knew 
how/when to resolve function options that conditionally control the kernel 
dispatched. With this knowledge I make the following observations regarding 
conditionally controlled function and kernel dispatching to prevent such checks 
from entering the hot-loop of execution:
   1. If multiple function variants are available then these are explicitly 
controlled by their name when invoking `CallFunction`. Nevertheless, in the 
public API (eg. 
[scalar](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/api_scalar.h))
 the function options can resolve the variant's name to call.
   2. If multiple kernel variants are available (and not resolved by input 
type), then function options can be resolved from `KernelContext` when creating 
kernel generators (`ArrayKernelExec`). This may require the kernels to have a 
template parameter of the function option of interest.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -454,6 +456,166 @@ struct PowerChecked {
   }
 };
 
+struct RoundUtils {
+  template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> = 
true>
+  static bool ApproxEqual(const T x, const T y, const int ulp = 7) {
+    // https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon
+    // The machine epsilon has to be scaled to the magnitude of the values used
+    // and multiplied by the desired precision in ULPs (units in the last 
place)
+    const auto eps_ulp = std::numeric_limits<T>::epsilon() * ulp;
+    const auto xy_diff = std::fabs(x - y);
+    const auto xy_sum = std::fabs(x + y);
+    return (xy_diff <= (xy_sum * eps_ulp))
+           // unless the result is subnormal
+           || (xy_diff < std::numeric_limits<T>::min());
+  }
+
+  template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> = 
true>
+  static bool IsHalf(T val) {
+    // |frac| == 0.5?
+    return ApproxEqual(std::fabs(std::fmod(val, T(1))), T(0.5));
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Floor(T val) {
+    return std::floor(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Ceiling(T val) {
+    return std::ceil(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Truncate(T val) {
+    return std::trunc(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> TowardsInfinity(T val) {
+    return std::signbit(val) ? std::floor(val) : std::ceil(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfDown(T val) {
+    return std::ceil(val - T(0.5));
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfUp(T val) {
+    return std::floor(val + T(0.5));
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> HalfToEven(T val) {
+    if (IsHalf(val)) {
+      auto floor = std::floor(val);
+      // Odd + 1, Even + 0
+      return floor + (std::fmod(std::fabs(floor), T(2)) >= T(1));
+    }
+    return std::round(val);
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> HalfToOdd(T val) {
+    if (IsHalf(val)) {
+      auto floor = std::floor(val);
+      // Odd + 0, Even + 1
+      return floor + (std::fmod(std::fabs(floor), T(2)) < T(1));
+    }
+    return std::round(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Nearest(T val) {
+    return std::round(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfTowardsZero(T val) {
+    return std::copysign(std::ceil(std::fabs(val) - T(0.5)), val);
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> Round(T val, T mult, RoundMode round_mode,
+                                           Status* st) {
+    val /= mult;
+
+    T result;
+    switch (round_mode) {

Review comment:
       Actually this was something that I thought about but did not knew 
how/when to resolve function options that conditionally control the kernel 
dispatched. With this knowledge I make the following observations regarding 
conditionally controlled function and kernel dispatching to prevent such checks 
from entering the hot-loop of execution:
   1. If multiple function variants are available then these are explicitly 
controlled by their name when invoking `CallFunction`. Nevertheless, in the 
public API (eg. 
[scalar](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/api_scalar.h))
 the function options can resolve the variant's name to call.
   2. If multiple kernel variants are available (and not resolved by input 
type), then function options can be resolved from `KernelContext` when 
selecting kernel generators (`ArrayKernelExec`). This may require the kernels 
to have a template parameter of the function option of interest.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -454,6 +456,166 @@ struct PowerChecked {
   }
 };
 
+struct RoundUtils {
+  template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> = 
true>
+  static bool ApproxEqual(const T x, const T y, const int ulp = 7) {
+    // https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon
+    // The machine epsilon has to be scaled to the magnitude of the values used
+    // and multiplied by the desired precision in ULPs (units in the last 
place)
+    const auto eps_ulp = std::numeric_limits<T>::epsilon() * ulp;
+    const auto xy_diff = std::fabs(x - y);
+    const auto xy_sum = std::fabs(x + y);
+    return (xy_diff <= (xy_sum * eps_ulp))
+           // unless the result is subnormal
+           || (xy_diff < std::numeric_limits<T>::min());
+  }
+
+  template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> = 
true>
+  static bool IsHalf(T val) {
+    // |frac| == 0.5?
+    return ApproxEqual(std::fabs(std::fmod(val, T(1))), T(0.5));
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Floor(T val) {
+    return std::floor(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Ceiling(T val) {
+    return std::ceil(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Truncate(T val) {
+    return std::trunc(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> TowardsInfinity(T val) {
+    return std::signbit(val) ? std::floor(val) : std::ceil(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfDown(T val) {
+    return std::ceil(val - T(0.5));
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfUp(T val) {
+    return std::floor(val + T(0.5));
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> HalfToEven(T val) {
+    if (IsHalf(val)) {
+      auto floor = std::floor(val);
+      // Odd + 1, Even + 0
+      return floor + (std::fmod(std::fabs(floor), T(2)) >= T(1));
+    }
+    return std::round(val);
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> HalfToOdd(T val) {
+    if (IsHalf(val)) {
+      auto floor = std::floor(val);
+      // Odd + 0, Even + 1
+      return floor + (std::fmod(std::fabs(floor), T(2)) < T(1));
+    }
+    return std::round(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> Nearest(T val) {
+    return std::round(val);
+  }
+
+  template <typename T>
+  static constexpr enable_if_floating_point<T> HalfTowardsZero(T val) {
+    return std::copysign(std::ceil(std::fabs(val) - T(0.5)), val);
+  }
+
+  template <typename T>
+  static enable_if_floating_point<T> Round(T val, T mult, RoundMode round_mode,
+                                           Status* st) {
+    val /= mult;
+
+    T result;
+    switch (round_mode) {

Review comment:
       Actually this was something that I thought about but did not knew 
how/when to resolve function options that conditionally control the kernel 
dispatched. With this knowledge I make the following observations regarding 
conditionally controlled function and kernel dispatching to prevent such checks 
from entering the hot-loop of execution:
   1. If multiple function variants are available then these are explicitly 
controlled by their name when invoking `CallFunction`. Nevertheless, in the 
public API (eg. 
[scalar](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/api_scalar.h))
 the [function options can resolve the variant's name to 
call](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/api_scalar.cc#L44-L48).
   2. If multiple kernel variants are available (and not resolved by input 
type), then function options can be resolved from `KernelContext` when 
selecting kernel generators (`ArrayKernelExec`). This may require the kernels 
to have a template parameter of the function option of interest.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] edponce commented on a change in pull request #10349: ARROW-12744: [C++][Compute] Add rounding kernel

Reply via email to