Max Gekk created SPARK-57586:
--------------------------------
Summary: Support the ANSI OVERLAPS predicate for datetime periods
(including TIME)
Key: SPARK-57586
URL: https://issues.apache.org/jira/browse/SPARK-57586
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. What
Implement the ANSI SQL {{<overlaps predicate>}} (ISO/IEC 9075-2, Subclause 8.x;
concept in 4.6.4): determine whether two chronological periods share at least
one
point in time.
Syntax:
{code:sql}
(start1, end1) OVERLAPS (start2, end2)
(start1, interval1) OVERLAPS (start2, interval2)
{code}
Each period is a row of two values: a start datetime and either an end datetime
or an
interval (the period is then {{[start, start + interval)}}). Per ANSI, each
period is
first normalized so the start is the earlier endpoint; a period of length 0 is
the single
point {{start}}; two periods overlap iff they have a common point. The result
is a boolean
(with the standard NULL rules for unknown endpoints).
Spark already reserves {{OVERLAPS}} as a keyword in the grammar
({{SqlBaseLexer.g4}} / {{SqlBaseParser.g4}}) but does not implement the
predicate for any
type, so it is unavailable today.
h2. Why
* ANSI conformance: {{OVERLAPS}} is a standard predicate Spark currently lacks.
* Direct user value for the TIME type: testing whether time-of-day ranges
intersect
(shifts, business hours, bookings) is a primary TIME use case, e.g.
{{(t_start, t_end) OVERLAPS (TIME'09:00', TIME'17:00')}}.
h2. Scope
* Parser: add the {{<overlaps predicate>}} production wired to the existing
{{OVERLAPS}}
keyword.
* Analysis/runtime: implement the ANSI overlap semantics (endpoint
normalization, the
point-period degenerate case, and NULL handling), reusing existing comparison
and
datetime + interval arithmetic.
* Endpoint types: support TIME endpoints (with the {{(start, interval)}} form
using a
day-time interval via the existing {{time + interval}} resolution); also
handle the
other datetime families (DATE, TIMESTAMP, TIMESTAMP_NTZ) since the predicate
is
type-agnostic. Both endpoints of a period must be the same/comparable
datetime family.
* Tests: parser, analyzer (type checking / error cases), and golden-file
coverage,
including TIME periods and boundary cases (touching endpoints, zero-length
periods,
NULL endpoints).
h2. Out of scope
* Period/temporal-table features (application-time periods, {{PERIOD}} columns).
* Cross-family periods (e.g. a TIME endpoint against a TIMESTAMP endpoint).
h2. Acceptance criteria
* {{(TIME'09:00', TIME'12:00') OVERLAPS (TIME'11:00', TIME'13:00')}} returns
{{true}};
{{... OVERLAPS (TIME'12:00', TIME'13:00')}} returns {{false}} (touching, not
overlapping).
* The {{(start, interval)}} form works for TIME with a day-time interval.
* Endpoint order is normalized (swapped start/end give the same result); NULL
endpoints
follow the ANSI three-valued rules.
h2. Note
The predicate is not TIME-specific - it applies equally to DATE/TIMESTAMP. It
is filed
under the TIME umbrella because TIME is the primary motivation; the reporter
may choose to
re-home it as a general SQL feature.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]