[
https://issues.apache.org/jira/browse/SPARK-26008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684560#comment-16684560
]
Hyukjin Kwon commented on SPARK-26008:
--------------------------------------
Discussing a rough idea or wish should ideally start from a dev mailing list.
It is possible to propose new features as well. These are generally not helpful
unless accompanied by detail, such as a design document and/or code change. If
you're not going to work on this, please don't reopen but start it from mailing
list.
> Structured Streaming Manual clock for simulation
> ------------------------------------------------
>
> Key: SPARK-26008
> URL: https://issues.apache.org/jira/browse/SPARK-26008
> Project: Spark
> Issue Type: Wish
> Components: Structured Streaming
> Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
> Reporter: Tom Bar Yacov
> Priority: Major
>
> Structured streaming Internal {color:#333333}StreamTest{color} class allows
> to test incremental logic and verify outputs between multiple triggers. It
> support changing the internal spark clock to get full deterministic
> simulation of the incremental state and APIs. This is not possible outside
> tests since {color:#333333}DataStreamWriter{color} hides the triggerClock
> parameter and is final.
> This can be very useful not only in unit test mode but also for a real
> running query. for example when you have all the Kafka historical data
> persisted to hdfs with its Kafka timestamp and you want to "play" the data
> and simulate the streaming application output as if running on this data in
> live streaming including incremental output between triggers.
> Currently I can simulate multiple triggers and incremental logic for some of
> the APIs, but for APIs that depend on the execution clock like
> {color:#333333}mapGroupsWithState{color} with execution based timeout I did
> not find a way to do this.
> I would like to allow passing an externally controlled clock as parameter to
> DataStreamWriter and to the query itself.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]