Hi, We have reduceByKeyAndWindow with inverse function feature in our Streaming job to calculate rolling counts for the past hour and for the past 24 hours. It seems that the functionality is iterating over all the keys in the window even though they are not present in the current batch causing the processing times to be high. My batch size is 1 minute. Is there a way that the reduceByKeyAndWindow would just iterate over the keys present in the current batch instead of reducing over all the keys in the Window? Because typically the updates would happen only for the keys present in the current batch.
Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-reduceByKeyAndWindow-with-inverse-function-seems-to-iterate-over-all-the-keys-in-theh-tp28792.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org