Max Gekk created SPARK-57843:
--------------------------------

             Summary: Support nanosecond-precision timestamps in streaming 
stateful operators
                 Key: SPARK-57843
                 URL: https://issues.apache.org/jira/browse/SPARK-57843
             Project: Spark
          Issue Type: Sub-task
          Components: Structured Streaming
    Affects Versions: 4.3.0
            Reporter: Max Gekk


This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond 
precision).

h2. Problem
Streaming stateful operators assume microsecond {{Long}} event times: 
{{StreamingSymmetricHashJoinExec}} (~L747, ~L980-984) uses {{getLong}} and 
{{watermarkMs * 1000}}; {{StreamingSessionWindowStateManager}} (~L135) 
hard-codes {{TimestampType}} in the state key schema; 
{{SymmetricHashJoinStateManager}} reads event times via {{getLong}}. The 
{{RocksDBStateEncoder}} is schema-generic, but the operators above are not.

h2. Goal
Allow nanosecond event-time columns to flow through stream-stream join eviction 
and session-window state, preserving nanosecond resolution in state keys and 
eviction comparisons.

h2. Scope
Update the state schema and eviction/read paths in the listed operators to 
handle {{TimestampNanosVal}}.

h2. Acceptance criteria
* Stream-stream joins and session windows keyed on / bounded by nanosecond 
event time evict and emit correctly.

h2. Testing
{{StreamingJoinSuite}}, {{StreamingSessionWindowSuite}}.

h2. Dependencies
Do AFTER SPARK-57830 (event-time watermark on nanosecond columns) and 
SPARK-57829 (window/session_window over nanosecond timestamps).




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to