Max Gekk created SPARK-57586:
--------------------------------

             Summary: Support the ANSI OVERLAPS predicate for datetime periods 
(including TIME)
                 Key: SPARK-57586
                 URL: https://issues.apache.org/jira/browse/SPARK-57586
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. What

Implement the ANSI SQL {{<overlaps predicate>}} (ISO/IEC 9075-2, Subclause 8.x;
concept in 4.6.4): determine whether two chronological periods share at least 
one
point in time.

Syntax:
{code:sql}
(start1, end1)     OVERLAPS (start2, end2)
(start1, interval1) OVERLAPS (start2, interval2)
{code}

Each period is a row of two values: a start datetime and either an end datetime 
or an
interval (the period is then {{[start, start + interval)}}). Per ANSI, each 
period is
first normalized so the start is the earlier endpoint; a period of length 0 is 
the single
point {{start}}; two periods overlap iff they have a common point. The result 
is a boolean
(with the standard NULL rules for unknown endpoints).

Spark already reserves {{OVERLAPS}} as a keyword in the grammar
({{SqlBaseLexer.g4}} / {{SqlBaseParser.g4}}) but does not implement the 
predicate for any
type, so it is unavailable today.

h2. Why

* ANSI conformance: {{OVERLAPS}} is a standard predicate Spark currently lacks.
* Direct user value for the TIME type: testing whether time-of-day ranges 
intersect
  (shifts, business hours, bookings) is a primary TIME use case, e.g.
  {{(t_start, t_end) OVERLAPS (TIME'09:00', TIME'17:00')}}.

h2. Scope

* Parser: add the {{<overlaps predicate>}} production wired to the existing 
{{OVERLAPS}}
  keyword.
* Analysis/runtime: implement the ANSI overlap semantics (endpoint 
normalization, the
  point-period degenerate case, and NULL handling), reusing existing comparison 
and
  datetime + interval arithmetic.
* Endpoint types: support TIME endpoints (with the {{(start, interval)}} form 
using a
  day-time interval via the existing {{time + interval}} resolution); also 
handle the
  other datetime families (DATE, TIMESTAMP, TIMESTAMP_NTZ) since the predicate 
is
  type-agnostic. Both endpoints of a period must be the same/comparable 
datetime family.
* Tests: parser, analyzer (type checking / error cases), and golden-file 
coverage,
  including TIME periods and boundary cases (touching endpoints, zero-length 
periods,
  NULL endpoints).

h2. Out of scope

* Period/temporal-table features (application-time periods, {{PERIOD}} columns).
* Cross-family periods (e.g. a TIME endpoint against a TIMESTAMP endpoint).

h2. Acceptance criteria

* {{(TIME'09:00', TIME'12:00') OVERLAPS (TIME'11:00', TIME'13:00')}} returns 
{{true}};
  {{... OVERLAPS (TIME'12:00', TIME'13:00')}} returns {{false}} (touching, not 
overlapping).
* The {{(start, interval)}} form works for TIME with a day-time interval.
* Endpoint order is normalized (swapped start/end give the same result); NULL 
endpoints
  follow the ANSI three-valued rules.

h2. Note

The predicate is not TIME-specific - it applies equally to DATE/TIMESTAMP. It 
is filed
under the TIME umbrella because TIME is the primary motivation; the reporter 
may choose to
re-home it as a general SQL feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to