Spark Streaming reduceByKeyAndWindow with inverse function seems to iterate over all the keys in the window even though they are not present in the current batch

SRK Mon, 26 Jun 2017 12:53:21 -0700

Hi,

We have reduceByKeyAndWindow with inverse function feature in our Streaming
job to calculate rolling counts for the past hour and for the past 24 hours.
It seems that the functionality is iterating over all the keys in the window
even though they are not present in the current batch causing the processing
times to be high. My batch size is 1 minute. Is there a way that the
reduceByKeyAndWindow would just iterate over the keys present in the current
batch instead of reducing over all the keys in the Window? Because typically
the updates would happen only for the keys present in the current batch.


Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-reduceByKeyAndWindow-with-inverse-function-seems-to-iterate-over-all-the-keys-in-theh-tp28792.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark Streaming reduceByKeyAndWindow with inverse function seems to iterate over all the keys in the window even though they are not present in the current batch

Reply via email to