[
https://issues.apache.org/jira/browse/SPARK-53870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim updated SPARK-53870:
---------------------------------
Fix Version/s: 4.0.2
> Python streaming transform_with_state StateServer does not fully read large
> state values
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-53870
> URL: https://issues.apache.org/jira/browse/SPARK-53870
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 4.1.0, 4.0.0, 4.0.1
> Reporter: Jason Teoh
> Assignee: Jason Teoh
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.1.0, 4.0.2
>
>
> The TransformWithState StateServer's {{parseProtoMessage}} method uses
> {{read}} (InputStream/FilterInputStream) which only reads all available data
> and may not return the full message. We should be using the [readFully
> DataInputStream
> API|https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/DataInput.html#readFully(byte%5B%5D)]
> instead, which will continue fetching until it fills up the provided buffer.
> In addition to the linked API above, this StackOverflow post also illustrates
> the difference between the two APIs: [https://stackoverflow.com/a/25900095]
> Without this change, it is possible for the state server to fail to fully
> read large proto messages (e.g., those containing a large state value update)
> and run into a parsing error.
>
> Affected versions identified by the tags on the original PR, it seems to have
> been present since the state server was introduced:
> [https://github.com/apache/spark/commit/def42d44405af5df78c3039ac5ad0f8a0469efaa]
>
> In practice this seems like an uncommon scenario (bug was
> identified/confirmed with a 512KB string state value update which likely
> produces a proto message much larger than typical use cases)
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]