Spark StreamingStatefull information

Arttii Thu, 22 Oct 2015 02:54:58 -0700

Hi,

So I am working on a usecase, where Clients are walking in and out of
geofences and sendingmessages based on that.
I currently have some in Memory Broadcast vars to do certain lookups for
client and geofence info, some of this is also coming from Cassandra.
My current quandry is that I need to support the case where a user comes in
and out of geofence and also track how many messages have already been sent
and do some logic based on that.


My stream is basically a bunch  of jsons
{
member:""
beacon
state:"exit","enter"
}


This information is invalidated at certain timesteps say messages a day and
geofence every few minutes. Frist I thought if broadcast vars are good for
this, but this gets updated a bunch so i do not think I can peridically
rebroadcast these from the driver.

So I was thinking this might be a perfect case for UpdateStateByKey as I can
kinda track what is going
and also track the time inside the values and return Nones to "pop" things.

Currently I cannot wrap my head around on how to use this stream in
conjuction with some other info that is coming in "Dstreams" "Rdds". All the
example for UpdateStatebyKey are basically doing something to a stream
updateStateBykey and then foreaching over it and persisting in a store. I
dont think writing and reading from cassandra on every batch to get this
info is a good idea, because I might get stale info.

Is this a valid case or am I missing the point and usecase of this function?

Thanks,
Artyom







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-StreamingStatefull-information-tp25160.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark StreamingStatefull information

Reply via email to