Hi, So I am working on a usecase, where Clients are walking in and out of geofences and sendingmessages based on that. I currently have some in Memory Broadcast vars to do certain lookups for client and geofence info, some of this is also coming from Cassandra. My current quandry is that I need to support the case where a user comes in and out of geofence and also track how many messages have already been sent and do some logic based on that.
My stream is basically a bunch of jsons { member:"" beacon state:"exit","enter" } This information is invalidated at certain timesteps say messages a day and geofence every few minutes. Frist I thought if broadcast vars are good for this, but this gets updated a bunch so i do not think I can peridically rebroadcast these from the driver. So I was thinking this might be a perfect case for UpdateStateByKey as I can kinda track what is going and also track the time inside the values and return Nones to "pop" things. Currently I cannot wrap my head around on how to use this stream in conjuction with some other info that is coming in "Dstreams" "Rdds". All the example for UpdateStatebyKey are basically doing something to a stream updateStateBykey and then foreaching over it and persisting in a store. I dont think writing and reading from cassandra on every batch to get this info is a good idea, because I might get stale info. Is this a valid case or am I missing the point and usecase of this function? Thanks, Artyom -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-StreamingStatefull-information-tp25160.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org