edponce commented on a change in pull request #10349:
URL: https://github.com/apache/arrow/pull/10349#discussion_r659957016
##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of
both operands with
enough scale kept. Error is returned if the result precision is beyond the
decimal value range.
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s). Integral input(s) produce floating-point output(s) of same
value.
+If any of the input element(s) is null, the corresponding output element is
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity | Input types | Output type | Notes | Options
class |
++===============+============+=============+=============+==================================+
+| mround | Unary | Numeric | Float32/64 | (1)(2) |
:struct:`MRoundOptions` |
++---------------+------------+-------------+-------------+----------------------------------+
+| round | Unary | Numeric | Float32/64 | (1)(3) |
:struct:`RoundOptions` |
++---------------+------------+-------------+-------------+----------------------------------+
+
+* \(1) Output value is a 64-bit floating-point for integral inputs and the
+ retains the same type for floating-point inputs. By default rounding
functions
+ displace a value to the nearest integer with a round to even for breaking
ties.
+ Options are available to control the rounding behavior.
+* \(2) The ``multiple`` option specifies the rounding
+ scale and precision. Only the magnitude of the ``rounding multiple`` is
used,
+ its sign is ignored.
+* \(3) The ``ndigits`` option specifies the rounding precision in
+ terms of number of digits. A negative value corresponds to digits in the
+ non-decimal part.
+
++-------------------------+---------------------------------+
+| Round mode | Description/Examples |
++=========================+=================================+
+| DOWNWARD | Equivalent to ``floor(x)`` |
+| TOWARDS_NEG_INFINITY | 3.7 = 3, -3.2 = -4 |
Review comment:
Agree, good observation.
##########
File path: docs/source/cpp/compute.rst
##########
@@ -286,7 +287,7 @@ an ``Invalid`` :class:`Status` when overflow is detected.
+--------------------------+------------+--------------------+---------------------+
| power_checked | Binary | Numeric | Numeric
|
+--------------------------+------------+--------------------+---------------------+
-| subtract | Binary | Numeric | Numeric (1)
|
+| subtract | Binary | Numeric | Numeric
|
Review comment:
Not intentional at all. This was sloppy on my part.
##########
File path: docs/source/cpp/compute.rst
##########
@@ -312,6 +313,79 @@ precision of `divide` is at least the sum of precisions of
both operands with
enough scale kept. Error is returned if the result precision is beyond the
decimal value range.
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+These functions displace numeric input(s) to approximate and shorter numeric
+representation(s). Integral input(s) produce floating-point output(s) of same
value.
+If any of the input element(s) is null, the corresponding output element is
null.
+
++---------------+------------+-------------+-------------+----------------------------------+
+| Function name | Arity | Input types | Output type | Notes | Options
class |
Review comment:
Well, I was trying to mimick the [string
transforms](https://arrow.apache.org/docs/cpp/compute.html#string-transforms)
table, but noticed that there was a typo, so now *Notes* and *Options class*
are different columns.
##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -454,6 +456,166 @@ struct PowerChecked {
}
};
+struct RoundUtils {
+ template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> =
true>
+ static bool ApproxEqual(const T x, const T y, const int ulp = 7) {
+ // https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon
+ // The machine epsilon has to be scaled to the magnitude of the values used
+ // and multiplied by the desired precision in ULPs (units in the last
place)
+ const auto eps_ulp = std::numeric_limits<T>::epsilon() * ulp;
+ const auto xy_diff = std::fabs(x - y);
+ const auto xy_sum = std::fabs(x + y);
+ return (xy_diff <= (xy_sum * eps_ulp))
+ // unless the result is subnormal
+ || (xy_diff < std::numeric_limits<T>::min());
+ }
+
+ template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> =
true>
+ static bool IsHalf(T val) {
+ // |frac| == 0.5?
+ return ApproxEqual(std::fabs(std::fmod(val, T(1))), T(0.5));
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Floor(T val) {
+ return std::floor(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Ceiling(T val) {
+ return std::ceil(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Truncate(T val) {
+ return std::trunc(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> TowardsInfinity(T val) {
+ return std::signbit(val) ? std::floor(val) : std::ceil(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> HalfDown(T val) {
+ return std::ceil(val - T(0.5));
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> HalfUp(T val) {
+ return std::floor(val + T(0.5));
+ }
+
+ template <typename T>
+ static enable_if_floating_point<T> HalfToEven(T val) {
+ if (IsHalf(val)) {
+ auto floor = std::floor(val);
+ // Odd + 1, Even + 0
+ return floor + (std::fmod(std::fabs(floor), T(2)) >= T(1));
+ }
+ return std::round(val);
+ }
+
+ template <typename T>
+ static enable_if_floating_point<T> HalfToOdd(T val) {
+ if (IsHalf(val)) {
+ auto floor = std::floor(val);
+ // Odd + 0, Even + 1
+ return floor + (std::fmod(std::fabs(floor), T(2)) < T(1));
+ }
+ return std::round(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Nearest(T val) {
+ return std::round(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> HalfTowardsZero(T val) {
+ return std::copysign(std::ceil(std::fabs(val) - T(0.5)), val);
+ }
+
+ template <typename T>
+ static enable_if_floating_point<T> Round(T val, T mult, RoundMode round_mode,
+ Status* st) {
+ val /= mult;
+
+ T result;
+ switch (round_mode) {
Review comment:
Actually this was something that I thought about but did not knew
how/when to resolve function options that conditionally control the kernel
dispatched. With this knowledge I make the following observations regarding
conditionally controlled function and kernel dispatching to prevent such checks
from entering the hot-loop of execution:
1. If multiple function variants are available then these are explicitly
controlled by their name when invoking `CallFunction`. Nevertheless, in the
public API (eg.
[scalar](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/api_scalar.h))
the function options can resolve the variant's name to call.
2. If multiple kernel variants are available (and not resolved by input
type), then function options can be resolved from `KernelContext` when creating
kernel generators (`ArrayKernelExec`). This may require the kernels to have a
template parameter of the function option of interest.
##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -454,6 +456,166 @@ struct PowerChecked {
}
};
+struct RoundUtils {
+ template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> =
true>
+ static bool ApproxEqual(const T x, const T y, const int ulp = 7) {
+ // https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon
+ // The machine epsilon has to be scaled to the magnitude of the values used
+ // and multiplied by the desired precision in ULPs (units in the last
place)
+ const auto eps_ulp = std::numeric_limits<T>::epsilon() * ulp;
+ const auto xy_diff = std::fabs(x - y);
+ const auto xy_sum = std::fabs(x + y);
+ return (xy_diff <= (xy_sum * eps_ulp))
+ // unless the result is subnormal
+ || (xy_diff < std::numeric_limits<T>::min());
+ }
+
+ template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> =
true>
+ static bool IsHalf(T val) {
+ // |frac| == 0.5?
+ return ApproxEqual(std::fabs(std::fmod(val, T(1))), T(0.5));
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Floor(T val) {
+ return std::floor(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Ceiling(T val) {
+ return std::ceil(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Truncate(T val) {
+ return std::trunc(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> TowardsInfinity(T val) {
+ return std::signbit(val) ? std::floor(val) : std::ceil(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> HalfDown(T val) {
+ return std::ceil(val - T(0.5));
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> HalfUp(T val) {
+ return std::floor(val + T(0.5));
+ }
+
+ template <typename T>
+ static enable_if_floating_point<T> HalfToEven(T val) {
+ if (IsHalf(val)) {
+ auto floor = std::floor(val);
+ // Odd + 1, Even + 0
+ return floor + (std::fmod(std::fabs(floor), T(2)) >= T(1));
+ }
+ return std::round(val);
+ }
+
+ template <typename T>
+ static enable_if_floating_point<T> HalfToOdd(T val) {
+ if (IsHalf(val)) {
+ auto floor = std::floor(val);
+ // Odd + 0, Even + 1
+ return floor + (std::fmod(std::fabs(floor), T(2)) < T(1));
+ }
+ return std::round(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Nearest(T val) {
+ return std::round(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> HalfTowardsZero(T val) {
+ return std::copysign(std::ceil(std::fabs(val) - T(0.5)), val);
+ }
+
+ template <typename T>
+ static enable_if_floating_point<T> Round(T val, T mult, RoundMode round_mode,
+ Status* st) {
+ val /= mult;
+
+ T result;
+ switch (round_mode) {
Review comment:
Actually this was something that I thought about but did not knew
how/when to resolve function options that conditionally control the kernel
dispatched. With this knowledge I make the following observations regarding
conditionally controlled function and kernel dispatching to prevent such checks
from entering the hot-loop of execution:
1. If multiple function variants are available then these are explicitly
controlled by their name when invoking `CallFunction`. Nevertheless, in the
public API (eg.
[scalar](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/api_scalar.h))
the function options can resolve the variant's name to call.
2. If multiple kernel variants are available (and not resolved by input
type), then function options can be resolved from `KernelContext` when
selecting kernel generators (`ArrayKernelExec`). This may require the kernels
to have a template parameter of the function option of interest.
##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -454,6 +456,166 @@ struct PowerChecked {
}
};
+struct RoundUtils {
+ template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> =
true>
+ static bool ApproxEqual(const T x, const T y, const int ulp = 7) {
+ // https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon
+ // The machine epsilon has to be scaled to the magnitude of the values used
+ // and multiplied by the desired precision in ULPs (units in the last
place)
+ const auto eps_ulp = std::numeric_limits<T>::epsilon() * ulp;
+ const auto xy_diff = std::fabs(x - y);
+ const auto xy_sum = std::fabs(x + y);
+ return (xy_diff <= (xy_sum * eps_ulp))
+ // unless the result is subnormal
+ || (xy_diff < std::numeric_limits<T>::min());
+ }
+
+ template <typename T, enable_if_t<std::is_floating_point<T>::value, bool> =
true>
+ static bool IsHalf(T val) {
+ // |frac| == 0.5?
+ return ApproxEqual(std::fabs(std::fmod(val, T(1))), T(0.5));
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Floor(T val) {
+ return std::floor(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Ceiling(T val) {
+ return std::ceil(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Truncate(T val) {
+ return std::trunc(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> TowardsInfinity(T val) {
+ return std::signbit(val) ? std::floor(val) : std::ceil(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> HalfDown(T val) {
+ return std::ceil(val - T(0.5));
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> HalfUp(T val) {
+ return std::floor(val + T(0.5));
+ }
+
+ template <typename T>
+ static enable_if_floating_point<T> HalfToEven(T val) {
+ if (IsHalf(val)) {
+ auto floor = std::floor(val);
+ // Odd + 1, Even + 0
+ return floor + (std::fmod(std::fabs(floor), T(2)) >= T(1));
+ }
+ return std::round(val);
+ }
+
+ template <typename T>
+ static enable_if_floating_point<T> HalfToOdd(T val) {
+ if (IsHalf(val)) {
+ auto floor = std::floor(val);
+ // Odd + 0, Even + 1
+ return floor + (std::fmod(std::fabs(floor), T(2)) < T(1));
+ }
+ return std::round(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> Nearest(T val) {
+ return std::round(val);
+ }
+
+ template <typename T>
+ static constexpr enable_if_floating_point<T> HalfTowardsZero(T val) {
+ return std::copysign(std::ceil(std::fabs(val) - T(0.5)), val);
+ }
+
+ template <typename T>
+ static enable_if_floating_point<T> Round(T val, T mult, RoundMode round_mode,
+ Status* st) {
+ val /= mult;
+
+ T result;
+ switch (round_mode) {
Review comment:
Actually this was something that I thought about but did not knew
how/when to resolve function options that conditionally control the kernel
dispatched. With this knowledge I make the following observations regarding
conditionally controlled function and kernel dispatching to prevent such checks
from entering the hot-loop of execution:
1. If multiple function variants are available then these are explicitly
controlled by their name when invoking `CallFunction`. Nevertheless, in the
public API (eg.
[scalar](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/api_scalar.h))
the [function options can resolve the variant's name to
call](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/api_scalar.cc#L44-L48).
2. If multiple kernel variants are available (and not resolved by input
type), then function options can be resolved from `KernelContext` when
selecting kernel generators (`ArrayKernelExec`). This may require the kernels
to have a template parameter of the function option of interest.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]