pitrou commented on a change in pull request #10960:
URL: https://github.com/apache/arrow/pull/10960#discussion_r712210124



##########
File path: cpp/src/arrow/util/formatting_util_test.cc
##########
@@ -427,4 +427,42 @@ TEST(Formatting, Timestamp) {
   }
 }
 
+TEST(Formatting, Interval) {
+  using DayMilliseconds = DayTimeIntervalType::DayMilliseconds;
+  using MonthDayNanos = MonthDayNanoIntervalType::MonthDayNanos;
+
+  const int32_t max_int32 = std::numeric_limits<int32_t>::max();
+  const int32_t min_int32 = std::numeric_limits<int32_t>::min();
+  const int64_t max_int64 = std::numeric_limits<int64_t>::max();
+  const int64_t min_int64 = std::numeric_limits<int64_t>::min();
+  {
+    StringFormatter<MonthIntervalType> formatter(month_interval());
+
+    AssertFormatting(formatter, 0, "0m");

Review comment:
       Is it a recommended format for months? Should we use "M" instead?

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1364,6 +1364,50 @@ For timestamps inputs with non-empty timezone, localized 
timestamp components wi
 
 .. _ISO 8601 week date definition: 
https://en.wikipedia.org/wiki/ISO_week_date#First_week
 
+Temporal difference
+~~~~~~~~~~~~~~~~~~~
+
+These functions compute the difference between two timestamps in the
+specified unit. The difference is determined by the number of
+boundaries crossed, not the span of time. For example, the difference
+in days between 23:59:59 on one day and 00:00:01 on the next day is
+one day (since midnight was crossed), not zero days (even though less
+than 24 hours elapsed). Additionally, if the timestamp is zoned, the
+difference is calculated in the local timezone. For instance, the
+difference in years between "2019-12-31 18:00:00-0500" and "2019-12-31
+23:00:00-0500" is zero years, even though the underlying values in UTC
+do differ by a year.
+
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| Function name                   | Arity      | Input types       | Output 
type           | Options class              | Notes |

Review comment:
       If the "notes" column is empty, then can omit it.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1356,6 +1356,50 @@ For timestamps inputs with non-empty timezone, localized 
timestamp components wi
 
 .. _ISO 8601 week date definition: 
https://en.wikipedia.org/wiki/ISO_week_date#First_week
 
+Temporal difference
+~~~~~~~~~~~~~~~~~~~
+
+These functions compute the difference between two timestamps in the
+specified unit. The difference is determined by the number of
+boundaries crossed, not the span of time. For example, the difference
+in days between 23:59:59 on one day and 00:00:01 on the next day is
+one day (since midnight was crossed), not zero days (even though less
+than 24 hours elapsed). Additionally, if the timestamp is zoned, the
+difference is calculated in the local timezone. For instance, the
+difference in years between "2019-12-31 18:00:00-0500" and "2019-12-31
+23:00:00-0500" is zero years, even though the underlying values in UTC
+do differ by a year.
+
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| Function name                   | Arity      | Input types       | Output 
type           | Options class              | Notes |
++=================================+============+===================+=======================+============================+=======+
+| day_time_interval_between       | Binary     | Timestamp         | DayTime 
interval      |                            |       |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| days_between                    | Binary     | Timestamp         | Int64     
            |                            |       |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| hours_between                   | Binary     | Timestamp         | Int64     
            |                            |       |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| microseconds_between            | Binary     | Timestamp         | Int64     
            |                            |       |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| milliseconds_between            | Binary     | Timestamp         | Int64     
            |                            |       |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| minutes_between                 | Binary     | Timestamp         | Int64     
            |                            |       |
++---------------------------------+------------+-------------------+-----------------------+----------------------------+-------+
+| month_interval_between          | Binary     | Timestamp         | Month 
interval        |                            |       |

Review comment:
       Well, is it useful to have a `int64_t` return?

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1364,6 +1364,50 @@ For timestamps inputs with non-empty timezone, localized 
timestamp components wi
 
 .. _ISO 8601 week date definition: 
https://en.wikipedia.org/wiki/ISO_week_date#First_week
 
+Temporal difference
+~~~~~~~~~~~~~~~~~~~
+
+These functions compute the difference between two timestamps in the
+specified unit. The difference is determined by the number of
+boundaries crossed, not the span of time. For example, the difference
+in days between 23:59:59 on one day and 00:00:01 on the next day is
+one day (since midnight was crossed), not zero days (even though less
+than 24 hours elapsed). Additionally, if the timestamp is zoned, the

Review comment:
       "has a defined timezone" is less weird than "is zoned".

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -1103,6 +1399,114 @@ const FunctionDoc assume_timezone_doc{
     {"timestamps"},
     "AssumeTimezoneOptions"};
 
+const FunctionDoc years_between_doc{
+    "Compute the number of years between two timestamps",
+    ("Returns the number of year boundaries crossed from the first timestamp 
to the\n"

Review comment:
       How about: ```Return the number of year boundaries crossed from `start` 
to `end`.```

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -191,6 +198,60 @@ struct TemporalComponentExtract
   }
 };
 
+Status CheckTimezones(const ExecBatch& batch) {
+  const auto& timezone = GetInputTimezone(batch.values[0]);
+  for (int i = 1; i < batch.num_values(); i++) {
+    const auto& other_timezone = GetInputTimezone(batch.values[i]);
+    if (other_timezone != timezone) {
+      return Status::TypeError("Got differing time zone '", other_timezone,
+                               "' for argument ", i + 1, "; expected '", 
timezone, "'");
+    }
+  }
+  return Status::OK();
+}
+
+Status ValidateDayOfWeekOptions(const DayOfWeekOptions& options) {

Review comment:
       At some point, we may instead define a `Validate()` method on 
FunctionOptions.

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -191,6 +198,60 @@ struct TemporalComponentExtract
   }
 };
 
+Status CheckTimezones(const ExecBatch& batch) {
+  const auto& timezone = GetInputTimezone(batch.values[0]);
+  for (int i = 1; i < batch.num_values(); i++) {
+    const auto& other_timezone = GetInputTimezone(batch.values[i]);
+    if (other_timezone != timezone) {
+      return Status::TypeError("Got differing time zone '", other_timezone,
+                               "' for argument ", i + 1, "; expected '", 
timezone, "'");
+    }
+  }
+  return Status::OK();
+}
+
+Status ValidateDayOfWeekOptions(const DayOfWeekOptions& options) {
+  if (options.week_start < 1 || 7 < options.week_start) {
+    return Status::Invalid(
+        "week_start must follow ISO convention (Monday=1, Sunday=7). Got 
week_start=",
+        options.week_start);
+  }
+  return Status::OK();
+}
+
+template <template <typename...> class Op, typename Duration, typename InType,
+          typename OutType>
+struct TemporalComponentExtractBinary {

Review comment:
       Nit, but "Extract" refers to component extraction, which this isn't 
doing?

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -433,6 +512,173 @@ TEST_F(ScalarTemporalTest, DayOfWeek) {
                                                        /*week_start=*/8)));
 }
 
+TEST_F(ScalarTemporalTest, TestTemporalDifference) {
+  for (auto u : TimeUnit::values()) {
+    auto unit = timestamp(u);
+    auto arr1 = ArrayFromJSON(unit, times_seconds_precision);
+    auto arr2 = ArrayFromJSON(unit, times_seconds_precision2);
+    CheckScalarBinary("years_between", arr1, arr1, ArrayFromJSON(int64(), 
zeros));
+    CheckScalarBinary("years_between", arr1, arr2, ArrayFromJSON(int64(), 
years_between));
+    CheckScalarBinary("quarters_between", arr1, arr1, ArrayFromJSON(int64(), 
zeros));
+    CheckScalarBinary("quarters_between", arr1, arr2,
+                      ArrayFromJSON(int64(), quarters_between));
+    CheckScalarBinary("months_between", arr1, arr1, ArrayFromJSON(int64(), 
zeros));
+    CheckScalarBinary("months_between", arr1, arr1, ArrayFromJSON(int64(), 
zeros));
+    CheckScalarBinary("months_between", arr1, arr2,
+                      ArrayFromJSON(int64(), months_between));
+    CheckScalarBinary("month_interval_between", arr1, arr1,
+                      ArrayFromJSON(month_interval(), zeros));
+    CheckScalarBinary("month_interval_between", arr1, arr1,
+                      ArrayFromJSON(month_interval(), zeros));
+    CheckScalarBinary("month_interval_between", arr1, arr2,
+                      ArrayFromJSON(month_interval(), months_between));
+    CheckScalarBinary(
+        "month_day_nano_interval_between", arr1, arr1,
+        ArrayFromJSON(month_day_nano_interval(), 
month_day_nano_interval_between_zeros));
+    CheckScalarBinary(
+        "month_day_nano_interval_between", arr1, arr2,
+        ArrayFromJSON(month_day_nano_interval(), 
month_day_nano_interval_between));
+    CheckScalarBinary(
+        "day_time_interval_between", arr1, arr1,
+        ArrayFromJSON(day_time_interval(), day_time_interval_between_zeros));
+    CheckScalarBinary("day_time_interval_between", arr1, arr2,
+                      ArrayFromJSON(day_time_interval(), 
day_time_interval_between));
+    CheckScalarBinary("weeks_between", arr1, arr1, ArrayFromJSON(int64(), 
zeros));
+    CheckScalarBinary("weeks_between", arr1, arr2, ArrayFromJSON(int64(), 
weeks_between));
+    CheckScalarBinary("days_between", arr1, arr1, ArrayFromJSON(int64(), 
zeros));
+    CheckScalarBinary("days_between", arr1, arr2, ArrayFromJSON(int64(), 
days_between));
+    CheckScalarBinary("hours_between", arr1, arr1, ArrayFromJSON(int64(), 
zeros));
+    CheckScalarBinary("hours_between", arr1, arr2, ArrayFromJSON(int64(), 
hours_between));
+    CheckScalarBinary("minutes_between", arr1, arr1, ArrayFromJSON(int64(), 
zeros));
+    CheckScalarBinary("minutes_between", arr1, arr2,
+                      ArrayFromJSON(int64(), minutes_between));
+    CheckScalarBinary("seconds_between", arr1, arr1, ArrayFromJSON(int64(), 
zeros));
+    CheckScalarBinary("seconds_between", arr1, arr2,
+                      ArrayFromJSON(int64(), seconds_between));
+    CheckScalarBinary("milliseconds_between", arr1, arr1, 
ArrayFromJSON(int64(), zeros));
+    CheckScalarBinary("milliseconds_between", arr1, arr2,
+                      ArrayFromJSON(int64(), milliseconds_between));
+    CheckScalarBinary("microseconds_between", arr1, arr1, 
ArrayFromJSON(int64(), zeros));
+    CheckScalarBinary("microseconds_between", arr1, arr2,
+                      ArrayFromJSON(int64(), microseconds_between));
+    CheckScalarBinary("nanoseconds_between", arr1, arr1, 
ArrayFromJSON(int64(), zeros));
+    CheckScalarBinary("nanoseconds_between", arr1, arr2,
+                      ArrayFromJSON(int64(), nanoseconds_between));
+  }
+}
+
+TEST_F(ScalarTemporalTest, TestTemporalDifferenceWeeks) {
+  auto ty = timestamp(TimeUnit::SECOND);
+  auto days = ArrayFromJSON(ty, R"([
+    "2021-08-09", "2021-08-10", "2021-08-11", "2021-08-12", "2021-08-13", 
"2021-08-14", "2021-08-15",
+    "2021-08-16", "2021-08-17", "2021-08-18", "2021-08-19", "2021-08-20", 
"2021-08-21", "2021-08-22",
+    "2021-08-23", "2021-08-24", "2021-08-25", "2021-08-26", "2021-08-27", 
"2021-08-28", "2021-08-29"
+  ])");
+
+  DayOfWeekOptions options(/*one_based_numbering=*/false, 
/*week_start=Monday*/ 1);
+  EXPECT_THAT(CallFunction("weeks_between",
+                           {
+                               *MakeArrayFromScalar(

Review comment:
       Is `MakeArrayFromScalar` necessary or is the kernel able to broadcast a 
scalar input?

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1364,6 +1364,50 @@ For timestamps inputs with non-empty timezone, localized 
timestamp components wi
 
 .. _ISO 8601 week date definition: 
https://en.wikipedia.org/wiki/ISO_week_date#First_week
 
+Temporal difference
+~~~~~~~~~~~~~~~~~~~
+
+These functions compute the difference between two timestamps in the
+specified unit. The difference is determined by the number of
+boundaries crossed, not the span of time. For example, the difference
+in days between 23:59:59 on one day and 00:00:01 on the next day is
+one day (since midnight was crossed), not zero days (even though less
+than 24 hours elapsed). Additionally, if the timestamp is zoned, the
+difference is calculated in the local timezone. For instance, the
+difference in years between "2019-12-31 18:00:00-0500" and "2019-12-31
+23:00:00-0500" is zero years, even though the underlying values in UTC

Review comment:
       I would say "because the local year is the same even though the UTC 
years would be different".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to