[
https://issues.apache.org/jira/browse/SPARK-30294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim updated SPARK-30294:
---------------------------------
Affects Version/s: (was: 3.0.0)
3.1.0
> Read-only state store unnecessarily creates and deletes the temp file for
> delta file every batch
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-30294
> URL: https://issues.apache.org/jira/browse/SPARK-30294
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.1.0
> Reporter: Jungtaek Lim
> Priority: Minor
>
> [https://github.com/apache/spark/blob/d38f8167483d4d79e8360f24a8c0bffd51460659/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L143-L155]
> {code:java}
> /** Abort all the updates made on this store. This store will not be
> usable any more. */
> override def abort(): Unit = {
> // This if statement is to ensure that files are deleted only if there
> are changes to the
> // StateStore. We have two StateStores for each task, one which is used
> only for reading, and
> // the other used for read+write. We don't want the read-only to delete
> state files.
> if (state == UPDATING) {
> state = ABORTED
> cancelDeltaFile(compressedStream, deltaFileStream)
> } else {
> state = ABORTED
> }
> logInfo(s"Aborted version $newVersion for $this")
> } {code}
> Despite of the comment, read-only state store also does the same things for
> preparing write - creates the temporary file, initializes output streams for
> the file, closes these output streams, and deletes the temporary file. That
> is just unnecessary and gives confusion as according to the log messages two
> different instances seem to write to same delta file.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]