What are pros/cons and general idea behind state in Spark Streaming? By
state I mean state created by "mapWithState" (or updateStateByKey).

When to use it and when not? Is it a good idea to accumulate a state in
jobs running continuously years?

Example: Remember IP adresses of returning visitors. Key is an IP address
and state is a boolean set to true if we have seen the same IP before.
Let's start the job now and let it run until forever.

What happens to the state if we stop and then start the app? When can we
lose the state and never be able to recover it?

Too many question, I know.

Thanks

Rado

Reply via email to