[ https://issues.apache.org/jira/browse/SPARK-18791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tathagata Das reassigned SPARK-18791: ------------------------------------- Assignee: Tathagata Das > Stream-Stream Joins > ------------------- > > Key: SPARK-18791 > URL: https://issues.apache.org/jira/browse/SPARK-18791 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming > Reporter: Michael Armbrust > Assignee: Tathagata Das > > Stream stream join is a much requested, but missing feature in Structured > Streaming. While the join API exists in Datasets and DataFrames, it throws > UnsupportedOperationException when applied between two streaming > Datasets/DataFrames. To support this, we have to maintain the same semantics > as other Structured Streaming operations - the result of the operation after > consuming two data streams data till positions/offsets X and Y, respectively, > must be the same as a single batch join operation on all the data till > positions X and Y, respectively. To achieve this, the execution has to buffer > past data (i.e. streaming state) from each stream, so that future data can be > matched against past data. Here is the set of a few high-level requirements. > - Buffer past rows as streaming state (using StateStore), and joining with > the past rows. > - Support state cleanup using the event time watermark when possible. > - Support different types of joins (inner, left outer, right outer is in > highest demand for ETL/enrichment type use cases [kafka -> best-effort enrich > -> write to S3]) > - Support cascading join operations (i.e. joining more than 2 streams) > - Support multiple output modes (Append mode is in highest demand for > enabling ETL/enrichment type use cases) > All the work to incrementally build this is going represented by this JIRA, > with specific subtasks for each step. At this point, this is the rough > direction as follows: > - Implement stream-stream inner join in Append Mode, supporting multiple > cascaded joins. > - Extends it stream-stream left/right outer join in Append Mode -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org