pitrou commented on a change in pull request #10960:
URL: https://github.com/apache/arrow/pull/10960#discussion_r712210124
##########
File path: cpp/src/arrow/util/formatting_util_test.cc
##########
@@ -427,4 +427,42 @@ TEST(Formatting, Timestamp) {
}
}
+TEST(Formatting, Interval) {
+ using DayMilliseconds = DayTimeIntervalType::DayMilliseconds;
+ using MonthDayNanos = MonthDayNanoIntervalType::MonthDayNanos;
+
+ const int32_t max_int32 = std::numeric_limits<int32_t>::max();
+ const int32_t min_int32 = std::numeric_limits<int32_t>::min();
+ const int64_t max_int64 = std::numeric_limits<int64_t>::max();
+ const int64_t min_int64 = std::numeric_limits<int64_t>::min();
+ {
+ StringFormatter<MonthIntervalType> formatter(month_interval());
+
+ AssertFormatting(formatter, 0, "0m");
Review comment:
Is it a recommended format for months? Should we use "M" instead?
##########
File path: docs/source/cpp/compute.rst
##########
@@ -1364,6 +1364,50 @@ For timestamps inputs with non-empty timezone, localized
timestamp components wi
.. _ISO 8601 week date definition:
https://en.wikipedia.org/wiki/ISO_week_date#First_week
+Temporal difference
+~~~~~~~~~~~~~~~~~~~
+
+These functions compute the difference between two timestamps in the
+specified unit. The difference is determined by the number of
+boundaries crossed, not the span of time. For example, the difference
+in days between 23:59:59 on one day and 00:00:01 on the next day is
+one day (since midnight was crossed), not zero days (even though less
+than 24 hours elapsed). Additionally, if the timestamp is zoned, the
+difference is calculated in the local timezone. For instance, the
+difference in years between "2019-12-31 18:00:00-0500" and "2019-12-31
+23:00:00-0500" is zero years, even though the underlying values in UTC
+do differ by a year.
+
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| Function name | Arity | Input types | Output
type | Options class | Notes |
Review comment:
If the "notes" column is empty, then can omit it.
##########
File path: docs/source/cpp/compute.rst
##########
@@ -1356,6 +1356,50 @@ For timestamps inputs with non-empty timezone, localized
timestamp components wi
.. _ISO 8601 week date definition:
https://en.wikipedia.org/wiki/ISO_week_date#First_week
+Temporal difference
+~~~~~~~~~~~~~~~~~~~
+
+These functions compute the difference between two timestamps in the
+specified unit. The difference is determined by the number of
+boundaries crossed, not the span of time. For example, the difference
+in days between 23:59:59 on one day and 00:00:01 on the next day is
+one day (since midnight was crossed), not zero days (even though less
+than 24 hours elapsed). Additionally, if the timestamp is zoned, the
+difference is calculated in the local timezone. For instance, the
+difference in years between "2019-12-31 18:00:00-0500" and "2019-12-31
+23:00:00-0500" is zero years, even though the underlying values in UTC
+do differ by a year.
+
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| Function name | Arity | Input types | Output
type | Options class | Notes |
++=================================+============+===================+=======================+============================+=======+
+| day_time_interval_between | Binary | Timestamp | DayTime
interval | | |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| days_between | Binary | Timestamp | Int64
| | |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| hours_between | Binary | Timestamp | Int64
| | |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| microseconds_between | Binary | Timestamp | Int64
| | |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| milliseconds_between | Binary | Timestamp | Int64
| | |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| minutes_between | Binary | Timestamp | Int64
| | |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| month_interval_between | Binary | Timestamp | Month
interval | | |
Review comment:
Well, is it useful to have a `int64_t` return?
##########
File path: docs/source/cpp/compute.rst
##########
@@ -1364,6 +1364,50 @@ For timestamps inputs with non-empty timezone, localized
timestamp components wi
.. _ISO 8601 week date definition:
https://en.wikipedia.org/wiki/ISO_week_date#First_week
+Temporal difference
+~~~~~~~~~~~~~~~~~~~
+
+These functions compute the difference between two timestamps in the
+specified unit. The difference is determined by the number of
+boundaries crossed, not the span of time. For example, the difference
+in days between 23:59:59 on one day and 00:00:01 on the next day is
+one day (since midnight was crossed), not zero days (even though less
+than 24 hours elapsed). Additionally, if the timestamp is zoned, the
Review comment:
"has a defined timezone" is less weird than "is zoned".
##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -1103,6 +1399,114 @@ const FunctionDoc assume_timezone_doc{
{"timestamps"},
"AssumeTimezoneOptions"};
+const FunctionDoc years_between_doc{
+ "Compute the number of years between two timestamps",
+ ("Returns the number of year boundaries crossed from the first timestamp
to the\n"
Review comment:
How about: ```Return the number of year boundaries crossed from `start`
to `end`.```
##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -191,6 +198,60 @@ struct TemporalComponentExtract
}
};
+Status CheckTimezones(const ExecBatch& batch) {
+ const auto& timezone = GetInputTimezone(batch.values[0]);
+ for (int i = 1; i < batch.num_values(); i++) {
+ const auto& other_timezone = GetInputTimezone(batch.values[i]);
+ if (other_timezone != timezone) {
+ return Status::TypeError("Got differing time zone '", other_timezone,
+ "' for argument ", i + 1, "; expected '",
timezone, "'");
+ }
+ }
+ return Status::OK();
+}
+
+Status ValidateDayOfWeekOptions(const DayOfWeekOptions& options) {
Review comment:
At some point, we may instead define a `Validate()` method on
FunctionOptions.
##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -191,6 +198,60 @@ struct TemporalComponentExtract
}
};
+Status CheckTimezones(const ExecBatch& batch) {
+ const auto& timezone = GetInputTimezone(batch.values[0]);
+ for (int i = 1; i < batch.num_values(); i++) {
+ const auto& other_timezone = GetInputTimezone(batch.values[i]);
+ if (other_timezone != timezone) {
+ return Status::TypeError("Got differing time zone '", other_timezone,
+ "' for argument ", i + 1, "; expected '",
timezone, "'");
+ }
+ }
+ return Status::OK();
+}
+
+Status ValidateDayOfWeekOptions(const DayOfWeekOptions& options) {
+ if (options.week_start < 1 || 7 < options.week_start) {
+ return Status::Invalid(
+ "week_start must follow ISO convention (Monday=1, Sunday=7). Got
week_start=",
+ options.week_start);
+ }
+ return Status::OK();
+}
+
+template <template <typename...> class Op, typename Duration, typename InType,
+ typename OutType>
+struct TemporalComponentExtractBinary {
Review comment:
Nit, but "Extract" refers to component extraction, which this isn't
doing?
##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -433,6 +512,173 @@ TEST_F(ScalarTemporalTest, DayOfWeek) {
/*week_start=*/8)));
}
+TEST_F(ScalarTemporalTest, TestTemporalDifference) {
+ for (auto u : TimeUnit::values()) {
+ auto unit = timestamp(u);
+ auto arr1 = ArrayFromJSON(unit, times_seconds_precision);
+ auto arr2 = ArrayFromJSON(unit, times_seconds_precision2);
+ CheckScalarBinary("years_between", arr1, arr1, ArrayFromJSON(int64(),
zeros));
+ CheckScalarBinary("years_between", arr1, arr2, ArrayFromJSON(int64(),
years_between));
+ CheckScalarBinary("quarters_between", arr1, arr1, ArrayFromJSON(int64(),
zeros));
+ CheckScalarBinary("quarters_between", arr1, arr2,
+ ArrayFromJSON(int64(), quarters_between));
+ CheckScalarBinary("months_between", arr1, arr1, ArrayFromJSON(int64(),
zeros));
+ CheckScalarBinary("months_between", arr1, arr1, ArrayFromJSON(int64(),
zeros));
+ CheckScalarBinary("months_between", arr1, arr2,
+ ArrayFromJSON(int64(), months_between));
+ CheckScalarBinary("month_interval_between", arr1, arr1,
+ ArrayFromJSON(month_interval(), zeros));
+ CheckScalarBinary("month_interval_between", arr1, arr1,
+ ArrayFromJSON(month_interval(), zeros));
+ CheckScalarBinary("month_interval_between", arr1, arr2,
+ ArrayFromJSON(month_interval(), months_between));
+ CheckScalarBinary(
+ "month_day_nano_interval_between", arr1, arr1,
+ ArrayFromJSON(month_day_nano_interval(),
month_day_nano_interval_between_zeros));
+ CheckScalarBinary(
+ "month_day_nano_interval_between", arr1, arr2,
+ ArrayFromJSON(month_day_nano_interval(),
month_day_nano_interval_between));
+ CheckScalarBinary(
+ "day_time_interval_between", arr1, arr1,
+ ArrayFromJSON(day_time_interval(), day_time_interval_between_zeros));
+ CheckScalarBinary("day_time_interval_between", arr1, arr2,
+ ArrayFromJSON(day_time_interval(),
day_time_interval_between));
+ CheckScalarBinary("weeks_between", arr1, arr1, ArrayFromJSON(int64(),
zeros));
+ CheckScalarBinary("weeks_between", arr1, arr2, ArrayFromJSON(int64(),
weeks_between));
+ CheckScalarBinary("days_between", arr1, arr1, ArrayFromJSON(int64(),
zeros));
+ CheckScalarBinary("days_between", arr1, arr2, ArrayFromJSON(int64(),
days_between));
+ CheckScalarBinary("hours_between", arr1, arr1, ArrayFromJSON(int64(),
zeros));
+ CheckScalarBinary("hours_between", arr1, arr2, ArrayFromJSON(int64(),
hours_between));
+ CheckScalarBinary("minutes_between", arr1, arr1, ArrayFromJSON(int64(),
zeros));
+ CheckScalarBinary("minutes_between", arr1, arr2,
+ ArrayFromJSON(int64(), minutes_between));
+ CheckScalarBinary("seconds_between", arr1, arr1, ArrayFromJSON(int64(),
zeros));
+ CheckScalarBinary("seconds_between", arr1, arr2,
+ ArrayFromJSON(int64(), seconds_between));
+ CheckScalarBinary("milliseconds_between", arr1, arr1,
ArrayFromJSON(int64(), zeros));
+ CheckScalarBinary("milliseconds_between", arr1, arr2,
+ ArrayFromJSON(int64(), milliseconds_between));
+ CheckScalarBinary("microseconds_between", arr1, arr1,
ArrayFromJSON(int64(), zeros));
+ CheckScalarBinary("microseconds_between", arr1, arr2,
+ ArrayFromJSON(int64(), microseconds_between));
+ CheckScalarBinary("nanoseconds_between", arr1, arr1,
ArrayFromJSON(int64(), zeros));
+ CheckScalarBinary("nanoseconds_between", arr1, arr2,
+ ArrayFromJSON(int64(), nanoseconds_between));
+ }
+}
+
+TEST_F(ScalarTemporalTest, TestTemporalDifferenceWeeks) {
+ auto ty = timestamp(TimeUnit::SECOND);
+ auto days = ArrayFromJSON(ty, R"([
+ "2021-08-09", "2021-08-10", "2021-08-11", "2021-08-12", "2021-08-13",
"2021-08-14", "2021-08-15",
+ "2021-08-16", "2021-08-17", "2021-08-18", "2021-08-19", "2021-08-20",
"2021-08-21", "2021-08-22",
+ "2021-08-23", "2021-08-24", "2021-08-25", "2021-08-26", "2021-08-27",
"2021-08-28", "2021-08-29"
+ ])");
+
+ DayOfWeekOptions options(/*one_based_numbering=*/false,
/*week_start=Monday*/ 1);
+ EXPECT_THAT(CallFunction("weeks_between",
+ {
+ *MakeArrayFromScalar(
Review comment:
Is `MakeArrayFromScalar` necessary or is the kernel able to broadcast a
scalar input?
##########
File path: docs/source/cpp/compute.rst
##########
@@ -1364,6 +1364,50 @@ For timestamps inputs with non-empty timezone, localized
timestamp components wi
.. _ISO 8601 week date definition:
https://en.wikipedia.org/wiki/ISO_week_date#First_week
+Temporal difference
+~~~~~~~~~~~~~~~~~~~
+
+These functions compute the difference between two timestamps in the
+specified unit. The difference is determined by the number of
+boundaries crossed, not the span of time. For example, the difference
+in days between 23:59:59 on one day and 00:00:01 on the next day is
+one day (since midnight was crossed), not zero days (even though less
+than 24 hours elapsed). Additionally, if the timestamp is zoned, the
+difference is calculated in the local timezone. For instance, the
+difference in years between "2019-12-31 18:00:00-0500" and "2019-12-31
+23:00:00-0500" is zero years, even though the underlying values in UTC
Review comment:
I would say "because the local year is the same even though the UTC
years would be different".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]