[
https://issues.apache.org/jira/browse/SPARK-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296032#comment-14296032
]
Sean Owen commented on SPARK-5467:
----------------------------------
Is this about the same as SPARK-4392?
> DStreams should provide windowing based on timestamps from the data (as
> opposed to wall clock time)
> ---------------------------------------------------------------------------------------------------
>
> Key: SPARK-5467
> URL: https://issues.apache.org/jira/browse/SPARK-5467
> Project: Spark
> Issue Type: New Feature
> Components: Streaming
> Reporter: Imran Rashid
>
> DStreams currently only let you window based on wall clock time. This
> doesn't work very well when you're loading historical logs that are already
> sitting around, because they'll all go into one window. DStreams should
> provide a way for you to window based on a field of the incoming data. This
> would be useful if you want to either (1) bootstrap a streaming app from some
> logs or (2) test out the behavior of your app on historical logs, eg. for
> correctness or performance.
> I think there are some open questions here, such as whether the input data
> sources need to be sorted by time, how batches get triggered etc., but it
> seems like an important use case.
> This just came up on the mailing list:
> http://apache-spark-user-list.1001560.n3.nabble.com/reduceByKeyAndWindow-but-using-log-timestamps-instead-of-clock-seconds-td21405.html
> And I think it is also what was this Jira was getting at:
> https://issues.apache.org/jira/browse/SPARK-4427
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]