Jungtaek Lim created SPARK-45672:
------------------------------------
Summary: Provide a unified user-facing schema for state format
versions in state data source - reader
Key: SPARK-45672
URL: https://issues.apache.org/jira/browse/SPARK-45672
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Jungtaek Lim
As of now, except stream-stream join with joinSide option being specified,
state data source would provide the state "as it is" in the state store. This
means state data source will provide the different schema for operators having
multiple state format versions.
>From users' perspective, they do not care about the state format version,
>hence may be confused if the state data source produces different schema.
That said, we could probably consider defining and providing the same user
facing schema for each operator.
*Note that this would need further discussion* before coming up with code,
because there is a clear trade-off. It makes a strong coupling between state
data source and the implementation of stateful operators. Also, for the
argument of non-predictable output schema, users can call printSchema() to see
the output schema in prior to query.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]