Imran Rashid created SPARK-5467:
-----------------------------------

             Summary: DStreams should provide windowing based on timestamps 
from the data (as opposed to wall clock time)
                 Key: SPARK-5467
                 URL: https://issues.apache.org/jira/browse/SPARK-5467
             Project: Spark
          Issue Type: New Feature
          Components: Streaming
            Reporter: Imran Rashid


DStreams currently only let you window based on wall clock time.  This doesn't 
work very well when you're loading historical logs that are already sitting 
around, because they'll all go into one window.  DStreams should provide a way 
for you to window based on a field of the incoming data.  This would be useful 
if you want to either (1) bootstrap a streaming app from some logs or (2) test 
out the behavior of your app on historical logs, eg. for correctness or 
performance.

I think there are some open questions here, such as whether the input data 
sources need to be sorted by time, how batches get triggered etc., but it seems 
like an important use case.

This just came up on the mailing list: 
http://apache-spark-user-list.1001560.n3.nabble.com/reduceByKeyAndWindow-but-using-log-timestamps-instead-of-clock-seconds-td21405.html

And I think it is also what was this Jira was getting at: 
https://issues.apache.org/jira/browse/SPARK-4427



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to