Thanks again for your help 🙂 Evan
On Sat, Apr 24, 2021 at 14:01 Reuven Lax <[email protected]> wrote: > Yes - for example with Flink, periodic checkpoints are taken and you can > configure those checkpoints to be saved in the cloud (for example, on > Google Cloud Storage). Upon a failure, Flink will load the last saved > checkpoint and restart processing from that point. > > On Sat, Apr 24, 2021 at 10:53 AM Evan Galpin <[email protected]> > wrote: > >> Thanks Reuven! I assume for other runners the semantics might differ >> significantly? >> >> Do you happen to know if the Dataflow storage model is documented >> anywhere, either in runner code or in documentation elsewhere? >> >> Thanks again, >> Evan >> >> On Sat, Apr 24, 2021 at 12:35 Reuven Lax <[email protected]> wrote: >> >>> In the case of Dataflow, storage is backed by a distributed storage >>> system, and this storage is separate from the worker node. Crashing worker >>> nodes will not cause data loss. >>> >>> At the present time though, the storage is tied to a single data center. >>> >>> Reuven >>> >>> On Sat, Apr 24, 2021 at 9:19 AM Evan Galpin <[email protected]> >>> wrote: >>> >>>> Hi all! >>>> >>>> First off, I apologize for potentially dredging up a topic which has >>>> been asked a number of times before. I’m looking for slightly >>>> more/different info than I have seen before however: >>>> >>>> I’ve seen in a number of StackOverflow answers[1][2][3] mention of the >>>> phrase “durably committed” in response to questions on the topic of >>>> streaming pipelines reading from Unbounded sources like PubSub and Kafka. >>>> >>>> I’m curious to know more about the cases where “durably committed” data >>>> is materialized or, in the case of Dataflow, saved in “Dataflow internal >>>> storage” such as when mutating state or running GBK. >>>> >>>> What durability/redundancy guarantees are there in these cases? Is >>>> “Dataflow internal storage” backed by something like Google Cloud Storage? >>>> If a pipeline has a single worker node with materialized data in the >>>> pipeline which has not yet been written to a Sink, what happens if that >>>> singular worker were to crash and vanish? Can data loss occur like this? >>>> >>>> Thanks! >>>> Evan >>>> >>>> [1] >>>> https://stackoverflow.com/a/66338947/6432284 >>>> [2] >>>> https://stackoverflow.com/a/46750189/6432284 >>>> [3] >>>> https://stackoverflow.com/a/37309304/6432284 >>>> >>>
