Max Gekk created SPARK-57830:
--------------------------------
Summary: Support event-time watermark on nanosecond-precision
timestamp columns
Key: SPARK-57830
URL: https://issues.apache.org/jira/browse/SPARK-57830
Project: Spark
Issue Type: Sub-task
Components: Structured Streaming
Affects Versions: 4.3.0
Reporter: Max Gekk
This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
precision).
h2. Problem
{{CheckAnalysis}} (analysis/CheckAnalysis.scala ~L651-661) accepts an
event-time column only if it is {{TimestampType}} (or a window struct whose
{{end}} is {{TimestampType}}); a nanosecond event-time column fails with
{{EVENT_TIME_IS_NOT_ON_TIMESTAMP_TYPE}}. Downstream, the watermark predicate in
{{statefulOperators.scala}} (~L672-680) builds {{Literal(watermarkMs * 1000)}}
and compares it as a microsecond {{Long}}, and dedup-within-watermark reads the
event time via {{getLong}} - incompatible with the 16-byte
{{TimestampNanosVal}}.
h2. Goal
Allow a nanosecond timestamp column as the event-time / watermark column, with
the watermark threshold compared correctly against the nanosecond value.
h2. Scope
Extend the {{CheckAnalysis}} event-time type check to accept
{{AnyTimestampNanoType}}; make the watermark predicate and eviction
read/compare the nanosecond value (epoch micros + remainder) rather than
assuming a microsecond {{Long}}.
h2. Acceptance criteria
* {{withWatermark}} on a nanosecond column analyzes; late-record dropping and
watermark advancement are correct to nanosecond resolution.
h2. Testing
{{EventTimeWatermarkSuite}} and streaming dedup tests with nanosecond event
time.
h2. Dependencies
None hard. PREREQ for the streaming stateful-operators sub-task.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]