[ https://issues.apache.org/jira/browse/SPARK-27237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798465#comment-16798465 ]
Jungtaek Lim commented on SPARK-27237: -------------------------------------- Working on this. > Introduce State schema validation among query restart > ----------------------------------------------------- > > Key: SPARK-27237 > URL: https://issues.apache.org/jira/browse/SPARK-27237 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.0.0 > Reporter: Jungtaek Lim > Priority: Major > > Even though Spark structured streaming guide page clearly documents that "Any > change in number or type of grouping keys or aggregates is not allowed.", > Spark doesn't do anything when end users try to do it, which would end up > with indeterministic outputs or unexpected exceptions. > Even worse, if the query doesn't crash by chance it could write the new > messed values to state which completely breaks state unless end users roll > back to specific batch via manually editing checkpoint. > The restriction is clear, the number of columns, and data type for each must > not be modified among query runs. We can store schema of state along with > state, and verify whether the (maybe) new schema is compatible if state > schema is modified. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org