Re: How Durable is “durably committed” Data?

Evan Galpin Sat, 24 Apr 2021 13:00:49 -0700

Thanks again for your help 🙂

Evan


On Sat, Apr 24, 2021 at 14:01 Reuven Lax <[email protected]> wrote:

> Yes - for example with Flink, periodic checkpoints are taken and you can
> configure those checkpoints to be saved in the cloud (for example, on
> Google Cloud Storage). Upon a failure, Flink will load the last saved
> checkpoint and restart processing from that point.
>
> On Sat, Apr 24, 2021 at 10:53 AM Evan Galpin <[email protected]>
> wrote:
>
>> Thanks Reuven! I assume for other runners the semantics might differ
>> significantly?
>>
>> Do you happen to know if the Dataflow storage model is documented
>> anywhere, either in runner code or in documentation elsewhere?
>>
>> Thanks again,
>> Evan
>>
>> On Sat, Apr 24, 2021 at 12:35 Reuven Lax <[email protected]> wrote:
>>
>>> In the case of Dataflow, storage is backed by a distributed storage
>>> system, and this storage is separate from the worker node. Crashing worker
>>> nodes will not cause data loss.
>>>
>>> At the present time though, the storage is tied to a single data center.
>>>
>>> Reuven
>>>
>>> On Sat, Apr 24, 2021 at 9:19 AM Evan Galpin <[email protected]>
>>> wrote:
>>>
>>>> Hi all!
>>>>
>>>> First off, I apologize for potentially dredging up a topic which has
>>>> been asked a number of times before. I’m looking for slightly
>>>> more/different info than I have seen before however:
>>>>
>>>> I’ve seen in a number of StackOverflow answers[1][2][3] mention of the
>>>> phrase “durably committed” in response to questions on the topic of
>>>> streaming pipelines reading from Unbounded sources like PubSub and Kafka.
>>>>
>>>> I’m curious to know more about the cases where “durably committed” data
>>>> is materialized or, in the case of Dataflow, saved in “Dataflow internal
>>>> storage” such as when mutating state or running GBK.
>>>>
>>>> What durability/redundancy guarantees are there in these cases? Is
>>>> “Dataflow internal storage” backed by something like Google Cloud Storage?
>>>> If a pipeline has a single worker node with materialized data in the
>>>> pipeline which has not yet been written to a Sink, what happens if that
>>>> singular worker were to crash and vanish? Can data loss occur like this?
>>>>
>>>> Thanks!
>>>> Evan
>>>>
>>>> [1]
>>>> https://stackoverflow.com/a/66338947/6432284
>>>> [2]
>>>> https://stackoverflow.com/a/46750189/6432284
>>>> [3]
>>>> https://stackoverflow.com/a/37309304/6432284
>>>>
>>>

Re: How Durable is “durably committed” Data?

Reply via email to