[ 
https://issues.apache.org/jira/browse/SPARK-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296032#comment-14296032
 ] 

Sean Owen commented on SPARK-5467:
----------------------------------

Is this about the same as SPARK-4392?

> DStreams should provide windowing based on timestamps from the data (as 
> opposed to wall clock time)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5467
>                 URL: https://issues.apache.org/jira/browse/SPARK-5467
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Imran Rashid
>
> DStreams currently only let you window based on wall clock time.  This 
> doesn't work very well when you're loading historical logs that are already 
> sitting around, because they'll all go into one window.  DStreams should 
> provide a way for you to window based on a field of the incoming data.  This 
> would be useful if you want to either (1) bootstrap a streaming app from some 
> logs or (2) test out the behavior of your app on historical logs, eg. for 
> correctness or performance.
> I think there are some open questions here, such as whether the input data 
> sources need to be sorted by time, how batches get triggered etc., but it 
> seems like an important use case.
> This just came up on the mailing list: 
> http://apache-spark-user-list.1001560.n3.nabble.com/reduceByKeyAndWindow-but-using-log-timestamps-instead-of-clock-seconds-td21405.html
> And I think it is also what was this Jira was getting at: 
> https://issues.apache.org/jira/browse/SPARK-4427



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to