Did you consider UpdateStateByKey operation?

 

From: Sandeep Giri [mailto:[email protected]] 
Sent: Thursday, October 29, 2015 3:09 PM
To: user <[email protected]>; dev <[email protected]>
Subject: Maintaining overall cumulative data in Spark Streaming

 

Dear All,

 

If a continuous stream of text is coming in and you have to keep publishing the 
overall word count so far since 0:00 today, what would you do?

 

Publishing the results for a window is easy but if we have to keep aggregating 
the results, how to go about it?

 

I have tried to keep an StreamRDD with aggregated count and keep doing a 
fullouterjoin but didn't work. Seems like the StreamRDD gets reset.

 

Kindly help.

 

Regards,

Sandeep Giri

 

Reply via email to