Is mapWithState an answer for this ? https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html
On Thu, Jun 29, 2017 at 11:55 AM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > Here is a problem and I am wondering if Spark Streaming is the right tool > for this ? > > I have stream of messages m1, m2, m3....and each of those messages can be > in state s1, s2, s3,....sn (you can imagine the number of states are about > 100) and I want to compute some metrics that visit all the states from s1 > to sn but these state transitions can happen at indefinite amount of > time. A simple example of that would be count all messages that visited > state s1, s2, s3. Other words, the transition function should know that say > message m1 had visited state s1 and s2 but not s3 yet and once the message > m1 visits s3 increment the counter +=1 . > > If it makes anything easier I can say a message has to visit s1 before > visiting s2 and s2 before visiting s3 and so on but would like to know both > with and without order. > > Thanks! > >