How do you intend to handle corrupted files, in particular due to
process crashes during a write?
Will all writes to a cached directory append some suffix (e.g.,
".pending") and do a rename?
On 10/12/2021 17:54, Till Rohrmann wrote:
Hi everyone,
I would like to start a discussion about introducing an explicit working
directory for Flink processes that can be used to store information [1].
Per default this working directory will reside in the temporary directory
of the node Flink runs on. However, if configured to reside on a persistent
volume, then this information can be used to recover from process/node
failures. Moreover, such a working directory can be used to consolidate
some of our other directories Flink creates under /tmp (e.g. blobStorage,
RocksDB working directory).
Here is a draft PR that outlines the required changes [2].
Looking forward to your feedback.
[1] https://cwiki.apache.org/confluence/x/ZZiqCw
[2] https://github.com/apache/flink/pull/18083
Cheers,
Till