Andre Schumacher created SPARK-6462:
---------------------------------------

             Summary: UpdateStateByKey should allow inner join of new with old 
keys
                 Key: SPARK-6462
                 URL: https://issues.apache.org/jira/browse/SPARK-6462
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
    Affects Versions: 1.3.0
            Reporter: Andre Schumacher



In a nutshell: provide a (inner join) instead of a cogroup for updateStateByKey 
in StateDStream.

Details:

It is common to read data (saw weblog data) from a streaming source (say Kafka) 
and each time update the state of a relatively small number of keys.

If only the state changes need to be propagated to a downstream sink then one 
could avoid filtering out unchanged state in the user program and instead 
provide this functionality in the API (say by adding a updateStateChangesByKey 
method).

Note that this is related but not identical to:
https://issues.apache.org/jira/browse/SPARK-2629



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to