Yes - for example with Flink, periodic checkpoints are taken and you can
configure those checkpoints to be saved in the cloud (for example, on
Google Cloud Storage). Upon a failure, Flink will load the last saved
checkpoint and restart processing from that point.

On Sat, Apr 24, 2021 at 10:53 AM Evan Galpin <[email protected]> wrote:

> Thanks Reuven! I assume for other runners the semantics might differ
> significantly?
>
> Do you happen to know if the Dataflow storage model is documented
> anywhere, either in runner code or in documentation elsewhere?
>
> Thanks again,
> Evan
>
> On Sat, Apr 24, 2021 at 12:35 Reuven Lax <[email protected]> wrote:
>
>> In the case of Dataflow, storage is backed by a distributed storage
>> system, and this storage is separate from the worker node. Crashing worker
>> nodes will not cause data loss.
>>
>> At the present time though, the storage is tied to a single data center.
>>
>> Reuven
>>
>> On Sat, Apr 24, 2021 at 9:19 AM Evan Galpin <[email protected]>
>> wrote:
>>
>>> Hi all!
>>>
>>> First off, I apologize for potentially dredging up a topic which has
>>> been asked a number of times before. I’m looking for slightly
>>> more/different info than I have seen before however:
>>>
>>> I’ve seen in a number of StackOverflow answers[1][2][3] mention of the
>>> phrase “durably committed” in response to questions on the topic of
>>> streaming pipelines reading from Unbounded sources like PubSub and Kafka.
>>>
>>> I’m curious to know more about the cases where “durably committed” data
>>> is materialized or, in the case of Dataflow, saved in “Dataflow internal
>>> storage” such as when mutating state or running GBK.
>>>
>>> What durability/redundancy guarantees are there in these cases? Is
>>> “Dataflow internal storage” backed by something like Google Cloud Storage?
>>> If a pipeline has a single worker node with materialized data in the
>>> pipeline which has not yet been written to a Sink, what happens if that
>>> singular worker were to crash and vanish? Can data loss occur like this?
>>>
>>> Thanks!
>>> Evan
>>>
>>> [1]
>>> https://stackoverflow.com/a/66338947/6432284
>>> [2]
>>> https://stackoverflow.com/a/46750189/6432284
>>> [3]
>>> https://stackoverflow.com/a/37309304/6432284
>>>
>>

Reply via email to