[
https://issues.apache.org/jira/browse/SPARK-57829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57829:
-----------------------------------
Labels: pull-request-available (was: )
> Support window, session_window and window_time over nanosecond-precision
> timestamps
> -----------------------------------------------------------------------------------
>
> Key: SPARK-57829
> URL: https://issues.apache.org/jira/browse/SPARK-57829
> Project: Spark
> Issue Type: Sub-task
> Components: Structured Streaming
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
> precision).
> h2. Problem
> {{TimeWindow}} (expressions/TimeWindow.scala ~L100-111) and {{SessionWindow}}
> (expressions/SessionWindow.scala ~L70-81) accept only {{AnyTimestampType}}
> (microsecond) time columns, and window resolution ({{TimeWindowResolution}})
> uses the identity microsecond {{PreciseTimestampConversion}}. {{window_time}}
> inherits the window struct element type. So nanosecond time columns are
> rejected at analysis and bucketing is microsecond-based. Applies to both
> batch and streaming.
> h2. Goal
> Support nanosecond time columns in tumbling/sliding {{window}},
> {{session_window}}, and {{window_time}}, with bucket boundaries computed at
> the source precision.
> h2. Scope
> Accept {{AnyTimestampNanoType}} in the window input-type checks; extend the
> resolution/rewrite to compute buckets from {{TimestampNanosVal}}; ensure the
> produced window struct {{start}} / {{end}} types are consistent (nanosecond,
> or a documented microsecond rounding).
> h2. Acceptance criteria
> * {{GROUP BY window(ts_nanos, '1 second')}} and {{session_window(ts_nanos,
> ...)}} analyze and produce correct buckets; {{window_time}} returns a
> consistent type.
> h2. Testing
> {{DataFrameTimeWindowingSuite}}, {{DataFrameSessionWindowingSuite}};
> streaming window tests.
> h2. Dependencies
> None hard (day-time bucketing via resolved SPARK-57501; reuses the year-month
> interval sub-task for the year-month case). PREREQ for the streaming
> stateful-operators sub-task.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]