Jungtaek Lim created SPARK-45671:
------------------------------------

             Summary: Implement an option similar to corrupt record column in 
State Data Source Reader
                 Key: SPARK-45671
                 URL: https://issues.apache.org/jira/browse/SPARK-45671
             Project: Spark
          Issue Type: Sub-task
          Components: Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Jungtaek Lim


Querying against the state would be most likely failing if the underlying state 
file is corrupted. There may be another case that the binary data (raw) state 
store read from state file does not fit with state schema and ends up with 
exception/fatal error in runtime.

(We can't catch the case where the data is loaded with incorrect schema if it 
does not throw an exception. We cannot add the schema for every data.)

To handle above cases without failure, we want to provide state rows for valid 
rows, with also providing binary data for corrupted rows (like we do for 
CSV/JSON) if users specify an option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to