[ 
https://issues.apache.org/jira/browse/SPARK-57830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093064#comment-18093064
 ] 

Anupam Yadav commented on SPARK-57830:
--------------------------------------

I would like to work on this. Plan: extend event-time watermark support 
(withWatermark / EventTimeWatermark) to accept nanosecond-precision timestamp 
columns as the event-time column, converting the nanosecond values consistently 
with the existing microsecond-timestamp watermark path, with tests. Will put up 
a PR shortly.

> Support event-time watermark on nanosecond-precision timestamp columns
> ----------------------------------------------------------------------
>
>                 Key: SPARK-57830
>                 URL: https://issues.apache.org/jira/browse/SPARK-57830
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Structured Streaming
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond 
> precision).
> h2. Problem
> {{CheckAnalysis}} (analysis/CheckAnalysis.scala ~L651-661) accepts an 
> event-time column only if it is {{TimestampType}} (or a window struct whose 
> {{end}} is {{TimestampType}}); a nanosecond event-time column fails with 
> {{EVENT_TIME_IS_NOT_ON_TIMESTAMP_TYPE}}. Downstream, the watermark predicate 
> in {{statefulOperators.scala}} (~L672-680) builds {{Literal(watermarkMs * 
> 1000)}} and compares it as a microsecond {{Long}}, and dedup-within-watermark 
> reads the event time via {{getLong}} - incompatible with the 16-byte 
> {{TimestampNanosVal}}.
> h2. Goal
> Allow a nanosecond timestamp column as the event-time / watermark column, 
> with the watermark threshold compared correctly against the nanosecond value.
> h2. Scope
> Extend the {{CheckAnalysis}} event-time type check to accept 
> {{AnyTimestampNanoType}}; make the watermark predicate and eviction 
> read/compare the nanosecond value (epoch micros + remainder) rather than 
> assuming a microsecond {{Long}}.
> h2. Acceptance criteria
> * {{withWatermark}} on a nanosecond column analyzes; late-record dropping and 
> watermark advancement are correct to nanosecond resolution.
> h2. Testing
> {{EventTimeWatermarkSuite}} and streaming dedup tests with nanosecond event 
> time.
> h2. Dependencies
> None hard. PREREQ for the streaming stateful-operators sub-task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to