Petar Nikic created SPARK-56820:
-----------------------------------

             Summary: counter_diff window function for converting cumulative 
counters to delta format
                 Key: SPARK-56820
                 URL: https://issues.apache.org/jira/browse/SPARK-56820
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Petar Nikic


h2. Summary

Create a new window function, {{counter_diff}} for converting a cumulative 
counter metric to delta format by computing the difference between consecutive 
counter values within a time series.
h2. Syntax
{code:sql}
{{counter_diff ( value [, start_time] ) OVER clause}}
{code}
h2. Arguments
 * {*}value{*}: A cumulative counter. Must be numeric and non-negative.
 * {*}start_time{*}: An optional timestamp parameter which indicates when the 
counter was last set or reset to zero. It is used to signal counter resets.
 * {*}OVER clause{*}: The clause describing the windowing.
 ** {*}PARTITION BY{*}: used to separate independent counters. Good 
partitioning columns would be the metric name, as well as any attributes tied 
to the metric.
 ** {*}ORDER BY{*}: used to order the observations by the associated timestamp 
in ascending order.

h2. Returns

Returns the difference between the current counter value and the previous 
counter value, according to the order defined by ORDER BY in the OVER clause. 
The return type is the same as {{{}value{}}}.

Counter resets are handled by returning {{{}NULL{}}}. Resets are detected in 
two cases:
 * If the previous {{value}} is greater than the current {{{}value{}}}.
 * If {{start_time}} is provided, and the current time is greater than the 
previous one.

In addition, {{NULL}} is returned for the first row of the partition.

Errors are raised in the following cases:
 * the provided counter is negative or,
 * the {{start_time}} moves backwards in time.

h2. Examples

*_Example with a simple counter_*
{code:sql}
SELECT m, t, counter_diff(c) OVER (PARTITION BY m ORDER BY t)
FROM VALUES
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:00:00’, 100),
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’, 200),
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’, 400)
    AS tab (m, t, c)
{code}
{code:java}
http_requests   2026-01-01 00:00:00     NULL (first row)
http_requests   2026-01-01 00:01:00     100
http_requests   2026-01-01 00:02:00     200
{code}
*_Example with a counter reset_*
{code:sql}
SELECT m, t, counter_diff(c) OVER (PARTITION BY m ORDER BY t)
FROM VALUES
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:00:00’, 100),
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’, 200),
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’, 400),
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:04:00’, 50)
  AS tab (m, t, c)
{code}
{code:java}
http_requests   2026-01-01 00:00:00     NULL
http_requests   2026-01-01 00:01:00     100
http_requests   2026-01-01 00:02:00     200
http_requests   2026-01-01 00:04:00     NULL (counter: 400 -> 50)
{code}
*_Example with a counter reset with increased start_time_*
{code:sql}
SELECT m, t, counter_diff(c, st) OVER (PARTITION BY m ORDER BY t)
FROM VALUES
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’,
   TIMESTAMP ‘2026-01-01T00:00:00’, 100),
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’,
   TIMESTAMP ‘2026-01-01T00:00:00’, 200),
  (‘http_requests’, TIMESTAMP ‘2026-01-01T00:04:00’,
   TIMESTAMP ‘2026-01-01T00:02:31’, 400)
  AS tab (m, t, st, c)
{code}
{code:java}
http_requests   2026-01-01 00:01:00     NULL
http_requests   2026-01-01 00:02:00     100
http_requests   2026-01-01 00:04:00     NULL (start_time: 00:00 -> 02:31)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to