[ 
https://issues.apache.org/jira/browse/SPARK-26008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684560#comment-16684560
 ] 

Hyukjin Kwon commented on SPARK-26008:
--------------------------------------

Discussing a rough idea or wish should ideally start from a dev mailing list. 
It is possible to propose new features as well. These are generally not helpful 
unless accompanied by detail, such as a design document and/or code change. If 
you're not going to work on this, please don't reopen but start it from mailing 
list.

> Structured Streaming Manual clock for simulation
> ------------------------------------------------
>
>                 Key: SPARK-26008
>                 URL: https://issues.apache.org/jira/browse/SPARK-26008
>             Project: Spark
>          Issue Type: Wish
>          Components: Structured Streaming
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>            Reporter: Tom Bar Yacov
>            Priority: Major
>
> Structured streaming Internal {color:#333333}StreamTest{color} class allows 
> to test incremental logic and verify outputs between multiple triggers. It 
> support changing the internal spark clock to get full deterministic 
> simulation of the incremental state and APIs. This is not possible outside 
> tests since {color:#333333}DataStreamWriter{color} hides the triggerClock 
> parameter and is final.
> This can be very useful not only in unit test mode but also for a real 
> running query. for example when you have all the Kafka historical data 
> persisted to hdfs with its Kafka timestamp and you want to "play"  the data 
> and simulate the streaming application output as if  running on this data in 
> live streaming including incremental output between triggers.
> Currently I can simulate multiple triggers and incremental logic for some of 
> the APIs, but for APIs that depend on the execution clock like 
> {color:#333333}mapGroupsWithState{color} with execution based timeout I did 
> not find a way to do this.
> I would like to allow passing an externally controlled clock as parameter to 
> DataStreamWriter and to the query itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to