Petar Nikic created SPARK-56820:
-----------------------------------
Summary: counter_diff window function for converting cumulative
counters to delta format
Key: SPARK-56820
URL: https://issues.apache.org/jira/browse/SPARK-56820
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 4.2.0
Reporter: Petar Nikic
h2. Summary
Create a new window function, {{counter_diff}} for converting a cumulative
counter metric to delta format by computing the difference between consecutive
counter values within a time series.
h2. Syntax
{code:sql}
{{counter_diff ( value [, start_time] ) OVER clause}}
{code}
h2. Arguments
* {*}value{*}: A cumulative counter. Must be numeric and non-negative.
* {*}start_time{*}: An optional timestamp parameter which indicates when the
counter was last set or reset to zero. It is used to signal counter resets.
* {*}OVER clause{*}: The clause describing the windowing.
** {*}PARTITION BY{*}: used to separate independent counters. Good
partitioning columns would be the metric name, as well as any attributes tied
to the metric.
** {*}ORDER BY{*}: used to order the observations by the associated timestamp
in ascending order.
h2. Returns
Returns the difference between the current counter value and the previous
counter value, according to the order defined by ORDER BY in the OVER clause.
The return type is the same as {{{}value{}}}.
Counter resets are handled by returning {{{}NULL{}}}. Resets are detected in
two cases:
* If the previous {{value}} is greater than the current {{{}value{}}}.
* If {{start_time}} is provided, and the current time is greater than the
previous one.
In addition, {{NULL}} is returned for the first row of the partition.
Errors are raised in the following cases:
* the provided counter is negative or,
* the {{start_time}} moves backwards in time.
h2. Examples
*_Example with a simple counter_*
{code:sql}
SELECT m, t, counter_diff(c) OVER (PARTITION BY m ORDER BY t)
FROM VALUES
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:00:00’, 100),
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’, 200),
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’, 400)
AS tab (m, t, c)
{code}
{code:java}
http_requests 2026-01-01 00:00:00 NULL (first row)
http_requests 2026-01-01 00:01:00 100
http_requests 2026-01-01 00:02:00 200
{code}
*_Example with a counter reset_*
{code:sql}
SELECT m, t, counter_diff(c) OVER (PARTITION BY m ORDER BY t)
FROM VALUES
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:00:00’, 100),
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’, 200),
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’, 400),
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:04:00’, 50)
AS tab (m, t, c)
{code}
{code:java}
http_requests 2026-01-01 00:00:00 NULL
http_requests 2026-01-01 00:01:00 100
http_requests 2026-01-01 00:02:00 200
http_requests 2026-01-01 00:04:00 NULL (counter: 400 -> 50)
{code}
*_Example with a counter reset with increased start_time_*
{code:sql}
SELECT m, t, counter_diff(c, st) OVER (PARTITION BY m ORDER BY t)
FROM VALUES
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’,
TIMESTAMP ‘2026-01-01T00:00:00’, 100),
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’,
TIMESTAMP ‘2026-01-01T00:00:00’, 200),
(‘http_requests’, TIMESTAMP ‘2026-01-01T00:04:00’,
TIMESTAMP ‘2026-01-01T00:02:31’, 400)
AS tab (m, t, st, c)
{code}
{code:java}
http_requests 2026-01-01 00:01:00 NULL
http_requests 2026-01-01 00:02:00 100
http_requests 2026-01-01 00:04:00 NULL (start_time: 00:00 -> 02:31)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]