Hi Does updateStateByKey pass elements to updateFunc (in Seq[V]) in order in which they appear in the RDD? My guess is no which means updateFunc needs to be commutative. Am I correct? I've asked this question before but there were no takers.
Here's the scala docs for updateStateByKey /** * Return a new "state" DStream where the state for each key is updated by applying * the given function on the previous state of the key and the new values of each key. * Hash partitioning is used to generate the RDDs with Spark's default number of partitions. * @param updateFunc State update function. If `this` function returns None, then * corresponding state key-value pair will be eliminated. * @tparam S State type */ def updateStateByKey[S: ClassTag]( updateFunc: (Seq[V], Option[S]) => Option[S] ): DStream[(K, S)] = { updateStateByKey(updateFunc, defaultPartitioner()) }