Dataflow does something like this, however since work is
load balanced across workers a per-worker id doesn't work very well.
Dataflow divides the keyspace up into lexicographic ranges, and creates a
cache token per range.

On Tue, Aug 20, 2019 at 8:35 PM Thomas Weise <t...@apache.org> wrote:

> Commenting here vs. on the PR since related to the overall approach.
>
> Wouldn't it be simpler to have the runner just track a unique ID for each
> worker and use that to communicate if the cache is valid or not?
>
> * When the bundle is started, the runner tells the worker if the cache has
> become invalid (since it knows if another worker has mutated state)
> * When the worker sends mutation requests to the runner, it includes its
> own ID (or the runner already has it as contextual information). No need to
> wait for a response.
> * When the bundle is finished, the runner records the last writer (only if
> a change occurred)
>
> Whenever current worker ID and last writer ID doesn't match, cache is
> invalid.
>
> Thomas
>
>
> On Tue, Aug 20, 2019 at 11:42 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> Having cache tokens per key would be very expensive indeed and I believe
>> we should go with a single cache token "per" bundle.
>>
>> On Mon, Aug 19, 2019 at 11:36 AM Maximilian Michels <m...@apache.org>
>> wrote:
>>
>>> Maybe a Beam Python expert can chime in for Rakesh's question?
>>>
>>> Luke, I was assuming cache tokens to be per key and state id. During
>>> implementing an initial support on the Runner side, I realized that we
>>> probably want cache tokens to only be per state id. Note that if we had
>>> per-key cache tokens, the number of cache tokens would approach the
>>> total number of keys in an application.
>>>
>>> If anyone wants to have a look, here is a first version of the Runner
>>> side for cache tokens. Note that I only implemented cache tokens for
>>> BagUserState for now, but it can be easily added for side inputs as well.
>>>
>>> https://github.com/apache/beam/pull/9374
>>>
>>> -Max
>>>
>>>
>>>

Reply via email to