pitrou commented on a change in pull request #10960: URL: https://github.com/apache/arrow/pull/10960#discussion_r712210124
########## File path: cpp/src/arrow/util/formatting_util_test.cc ########## @@ -427,4 +427,42 @@ TEST(Formatting, Timestamp) { } } +TEST(Formatting, Interval) { + using DayMilliseconds = DayTimeIntervalType::DayMilliseconds; + using MonthDayNanos = MonthDayNanoIntervalType::MonthDayNanos; + + const int32_t max_int32 = std::numeric_limits<int32_t>::max(); + const int32_t min_int32 = std::numeric_limits<int32_t>::min(); + const int64_t max_int64 = std::numeric_limits<int64_t>::max(); + const int64_t min_int64 = std::numeric_limits<int64_t>::min(); + { + StringFormatter<MonthIntervalType> formatter(month_interval()); + + AssertFormatting(formatter, 0, "0m"); Review comment: Is it a recommended format for months? Should we use "M" instead? ########## File path: docs/source/cpp/compute.rst ########## @@ -1364,6 +1364,50 @@ For timestamps inputs with non-empty timezone, localized timestamp components wi .. _ISO 8601 week date definition: https://en.wikipedia.org/wiki/ISO_week_date#First_week +Temporal difference +~~~~~~~~~~~~~~~~~~~ + +These functions compute the difference between two timestamps in the +specified unit. The difference is determined by the number of +boundaries crossed, not the span of time. For example, the difference +in days between 23:59:59 on one day and 00:00:01 on the next day is +one day (since midnight was crossed), not zero days (even though less +than 24 hours elapsed). Additionally, if the timestamp is zoned, the +difference is calculated in the local timezone. For instance, the +difference in years between "2019-12-31 18:00:00-0500" and "2019-12-31 +23:00:00-0500" is zero years, even though the underlying values in UTC +do differ by a year. + ++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+ +| Function name | Arity | Input types | Output type | Options class | Notes | Review comment: If the "notes" column is empty, then can omit it. ########## File path: docs/source/cpp/compute.rst ########## @@ -1356,6 +1356,50 @@ For timestamps inputs with non-empty timezone, localized timestamp components wi .. _ISO 8601 week date definition: https://en.wikipedia.org/wiki/ISO_week_date#First_week +Temporal difference +~~~~~~~~~~~~~~~~~~~ + +These functions compute the difference between two timestamps in the +specified unit. The difference is determined by the number of +boundaries crossed, not the span of time. For example, the difference +in days between 23:59:59 on one day and 00:00:01 on the next day is +one day (since midnight was crossed), not zero days (even though less +than 24 hours elapsed). Additionally, if the timestamp is zoned, the +difference is calculated in the local timezone. For instance, the +difference in years between "2019-12-31 18:00:00-0500" and "2019-12-31 +23:00:00-0500" is zero years, even though the underlying values in UTC +do differ by a year. + ++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+ +| Function name | Arity | Input types | Output type | Options class | Notes | ++=================================+============+===================+=======================+============================+=======+ +| day_time_interval_between | Binary | Timestamp | DayTime interval | | | ++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+ +| days_between | Binary | Timestamp | Int64 | | | ++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+ +| hours_between | Binary | Timestamp | Int64 | | | ++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+ +| microseconds_between | Binary | Timestamp | Int64 | | | ++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+ +| milliseconds_between | Binary | Timestamp | Int64 | | | ++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+ +| minutes_between | Binary | Timestamp | Int64 | | | ++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+ +| month_interval_between | Binary | Timestamp | Month interval | | | Review comment: Well, is it useful to have a `int64_t` return? ########## File path: docs/source/cpp/compute.rst ########## @@ -1364,6 +1364,50 @@ For timestamps inputs with non-empty timezone, localized timestamp components wi .. _ISO 8601 week date definition: https://en.wikipedia.org/wiki/ISO_week_date#First_week +Temporal difference +~~~~~~~~~~~~~~~~~~~ + +These functions compute the difference between two timestamps in the +specified unit. The difference is determined by the number of +boundaries crossed, not the span of time. For example, the difference +in days between 23:59:59 on one day and 00:00:01 on the next day is +one day (since midnight was crossed), not zero days (even though less +than 24 hours elapsed). Additionally, if the timestamp is zoned, the Review comment: "has a defined timezone" is less weird than "is zoned". ########## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ########## @@ -1103,6 +1399,114 @@ const FunctionDoc assume_timezone_doc{ {"timestamps"}, "AssumeTimezoneOptions"}; +const FunctionDoc years_between_doc{ + "Compute the number of years between two timestamps", + ("Returns the number of year boundaries crossed from the first timestamp to the\n" Review comment: How about: ```Return the number of year boundaries crossed from `start` to `end`.``` ########## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ########## @@ -191,6 +198,60 @@ struct TemporalComponentExtract } }; +Status CheckTimezones(const ExecBatch& batch) { + const auto& timezone = GetInputTimezone(batch.values[0]); + for (int i = 1; i < batch.num_values(); i++) { + const auto& other_timezone = GetInputTimezone(batch.values[i]); + if (other_timezone != timezone) { + return Status::TypeError("Got differing time zone '", other_timezone, + "' for argument ", i + 1, "; expected '", timezone, "'"); + } + } + return Status::OK(); +} + +Status ValidateDayOfWeekOptions(const DayOfWeekOptions& options) { Review comment: At some point, we may instead define a `Validate()` method on FunctionOptions. ########## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ########## @@ -191,6 +198,60 @@ struct TemporalComponentExtract } }; +Status CheckTimezones(const ExecBatch& batch) { + const auto& timezone = GetInputTimezone(batch.values[0]); + for (int i = 1; i < batch.num_values(); i++) { + const auto& other_timezone = GetInputTimezone(batch.values[i]); + if (other_timezone != timezone) { + return Status::TypeError("Got differing time zone '", other_timezone, + "' for argument ", i + 1, "; expected '", timezone, "'"); + } + } + return Status::OK(); +} + +Status ValidateDayOfWeekOptions(const DayOfWeekOptions& options) { + if (options.week_start < 1 || 7 < options.week_start) { + return Status::Invalid( + "week_start must follow ISO convention (Monday=1, Sunday=7). Got week_start=", + options.week_start); + } + return Status::OK(); +} + +template <template <typename...> class Op, typename Duration, typename InType, + typename OutType> +struct TemporalComponentExtractBinary { Review comment: Nit, but "Extract" refers to component extraction, which this isn't doing? ########## File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc ########## @@ -433,6 +512,173 @@ TEST_F(ScalarTemporalTest, DayOfWeek) { /*week_start=*/8))); } +TEST_F(ScalarTemporalTest, TestTemporalDifference) { + for (auto u : TimeUnit::values()) { + auto unit = timestamp(u); + auto arr1 = ArrayFromJSON(unit, times_seconds_precision); + auto arr2 = ArrayFromJSON(unit, times_seconds_precision2); + CheckScalarBinary("years_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("years_between", arr1, arr2, ArrayFromJSON(int64(), years_between)); + CheckScalarBinary("quarters_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("quarters_between", arr1, arr2, + ArrayFromJSON(int64(), quarters_between)); + CheckScalarBinary("months_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("months_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("months_between", arr1, arr2, + ArrayFromJSON(int64(), months_between)); + CheckScalarBinary("month_interval_between", arr1, arr1, + ArrayFromJSON(month_interval(), zeros)); + CheckScalarBinary("month_interval_between", arr1, arr1, + ArrayFromJSON(month_interval(), zeros)); + CheckScalarBinary("month_interval_between", arr1, arr2, + ArrayFromJSON(month_interval(), months_between)); + CheckScalarBinary( + "month_day_nano_interval_between", arr1, arr1, + ArrayFromJSON(month_day_nano_interval(), month_day_nano_interval_between_zeros)); + CheckScalarBinary( + "month_day_nano_interval_between", arr1, arr2, + ArrayFromJSON(month_day_nano_interval(), month_day_nano_interval_between)); + CheckScalarBinary( + "day_time_interval_between", arr1, arr1, + ArrayFromJSON(day_time_interval(), day_time_interval_between_zeros)); + CheckScalarBinary("day_time_interval_between", arr1, arr2, + ArrayFromJSON(day_time_interval(), day_time_interval_between)); + CheckScalarBinary("weeks_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("weeks_between", arr1, arr2, ArrayFromJSON(int64(), weeks_between)); + CheckScalarBinary("days_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("days_between", arr1, arr2, ArrayFromJSON(int64(), days_between)); + CheckScalarBinary("hours_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("hours_between", arr1, arr2, ArrayFromJSON(int64(), hours_between)); + CheckScalarBinary("minutes_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("minutes_between", arr1, arr2, + ArrayFromJSON(int64(), minutes_between)); + CheckScalarBinary("seconds_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("seconds_between", arr1, arr2, + ArrayFromJSON(int64(), seconds_between)); + CheckScalarBinary("milliseconds_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("milliseconds_between", arr1, arr2, + ArrayFromJSON(int64(), milliseconds_between)); + CheckScalarBinary("microseconds_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("microseconds_between", arr1, arr2, + ArrayFromJSON(int64(), microseconds_between)); + CheckScalarBinary("nanoseconds_between", arr1, arr1, ArrayFromJSON(int64(), zeros)); + CheckScalarBinary("nanoseconds_between", arr1, arr2, + ArrayFromJSON(int64(), nanoseconds_between)); + } +} + +TEST_F(ScalarTemporalTest, TestTemporalDifferenceWeeks) { + auto ty = timestamp(TimeUnit::SECOND); + auto days = ArrayFromJSON(ty, R"([ + "2021-08-09", "2021-08-10", "2021-08-11", "2021-08-12", "2021-08-13", "2021-08-14", "2021-08-15", + "2021-08-16", "2021-08-17", "2021-08-18", "2021-08-19", "2021-08-20", "2021-08-21", "2021-08-22", + "2021-08-23", "2021-08-24", "2021-08-25", "2021-08-26", "2021-08-27", "2021-08-28", "2021-08-29" + ])"); + + DayOfWeekOptions options(/*one_based_numbering=*/false, /*week_start=Monday*/ 1); + EXPECT_THAT(CallFunction("weeks_between", + { + *MakeArrayFromScalar( Review comment: Is `MakeArrayFromScalar` necessary or is the kernel able to broadcast a scalar input? ########## File path: docs/source/cpp/compute.rst ########## @@ -1364,6 +1364,50 @@ For timestamps inputs with non-empty timezone, localized timestamp components wi .. _ISO 8601 week date definition: https://en.wikipedia.org/wiki/ISO_week_date#First_week +Temporal difference +~~~~~~~~~~~~~~~~~~~ + +These functions compute the difference between two timestamps in the +specified unit. The difference is determined by the number of +boundaries crossed, not the span of time. For example, the difference +in days between 23:59:59 on one day and 00:00:01 on the next day is +one day (since midnight was crossed), not zero days (even though less +than 24 hours elapsed). Additionally, if the timestamp is zoned, the +difference is calculated in the local timezone. For instance, the +difference in years between "2019-12-31 18:00:00-0500" and "2019-12-31 +23:00:00-0500" is zero years, even though the underlying values in UTC Review comment: I would say "because the local year is the same even though the UTC years would be different". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org