Yes - for example with Flink, periodic checkpoints are taken and you can configure those checkpoints to be saved in the cloud (for example, on Google Cloud Storage). Upon a failure, Flink will load the last saved checkpoint and restart processing from that point.
On Sat, Apr 24, 2021 at 10:53 AM Evan Galpin <[email protected]> wrote: > Thanks Reuven! I assume for other runners the semantics might differ > significantly? > > Do you happen to know if the Dataflow storage model is documented > anywhere, either in runner code or in documentation elsewhere? > > Thanks again, > Evan > > On Sat, Apr 24, 2021 at 12:35 Reuven Lax <[email protected]> wrote: > >> In the case of Dataflow, storage is backed by a distributed storage >> system, and this storage is separate from the worker node. Crashing worker >> nodes will not cause data loss. >> >> At the present time though, the storage is tied to a single data center. >> >> Reuven >> >> On Sat, Apr 24, 2021 at 9:19 AM Evan Galpin <[email protected]> >> wrote: >> >>> Hi all! >>> >>> First off, I apologize for potentially dredging up a topic which has >>> been asked a number of times before. I’m looking for slightly >>> more/different info than I have seen before however: >>> >>> I’ve seen in a number of StackOverflow answers[1][2][3] mention of the >>> phrase “durably committed” in response to questions on the topic of >>> streaming pipelines reading from Unbounded sources like PubSub and Kafka. >>> >>> I’m curious to know more about the cases where “durably committed” data >>> is materialized or, in the case of Dataflow, saved in “Dataflow internal >>> storage” such as when mutating state or running GBK. >>> >>> What durability/redundancy guarantees are there in these cases? Is >>> “Dataflow internal storage” backed by something like Google Cloud Storage? >>> If a pipeline has a single worker node with materialized data in the >>> pipeline which has not yet been written to a Sink, what happens if that >>> singular worker were to crash and vanish? Can data loss occur like this? >>> >>> Thanks! >>> Evan >>> >>> [1] >>> https://stackoverflow.com/a/66338947/6432284 >>> [2] >>> https://stackoverflow.com/a/46750189/6432284 >>> [3] >>> https://stackoverflow.com/a/37309304/6432284 >>> >>
