Jungtaek Lim created SPARK-45672:
------------------------------------

             Summary: Provide a unified user-facing schema for state format 
versions in state data source - reader
                 Key: SPARK-45672
                 URL: https://issues.apache.org/jira/browse/SPARK-45672
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Jungtaek Lim


As of now, except stream-stream join with joinSide option being specified, 
state data source would provide the state "as it is" in the state store. This 
means state data source will provide the different schema for operators having 
multiple state format versions.

>From users' perspective, they do not care about the state format version, 
>hence may be confused if the state data source produces different schema.

That said, we could probably consider defining and providing the same user 
facing schema for each operator.

*Note that this would need further discussion* before coming up with code, 
because there is a clear trade-off. It makes a strong coupling between state 
data source and the implementation of stateful operators. Also, for the 
argument of non-predictable output schema, users can call printSchema() to see 
the output schema in prior to query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to