Jungtaek Lim created SPARK-45671:
------------------------------------
Summary: Implement an option similar to corrupt record column in
State Data Source Reader
Key: SPARK-45671
URL: https://issues.apache.org/jira/browse/SPARK-45671
Project: Spark
Issue Type: Sub-task
Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Jungtaek Lim
Querying against the state would be most likely failing if the underlying state
file is corrupted. There may be another case that the binary data (raw) state
store read from state file does not fit with state schema and ends up with
exception/fatal error in runtime.
(We can't catch the case where the data is loaded with incorrect schema if it
does not throw an exception. We cannot add the schema for every data.)
To handle above cases without failure, we want to provide state rows for valid
rows, with also providing binary data for corrupted rows (like we do for
CSV/JSON) if users specify an option.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]