Andre Schumacher created SPARK-6462:
---------------------------------------
Summary: UpdateStateByKey should allow inner join of new with old
keys
Key: SPARK-6462
URL: https://issues.apache.org/jira/browse/SPARK-6462
Project: Spark
Issue Type: Improvement
Components: Streaming
Affects Versions: 1.3.0
Reporter: Andre Schumacher
In a nutshell: provide a (inner join) instead of a cogroup for updateStateByKey
in StateDStream.
Details:
It is common to read data (saw weblog data) from a streaming source (say Kafka)
and each time update the state of a relatively small number of keys.
If only the state changes need to be propagated to a downstream sink then one
could avoid filtering out unchanged state in the user program and instead
provide this functionality in the API (say by adding a updateStateChangesByKey
method).
Note that this is related but not identical to:
https://issues.apache.org/jira/browse/SPARK-2629
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]