This is interesting cause it leads to "why do the workers need to do it
again instead of reusing the computed one?". Technically the answer is
trivial but in terms of design I think beam tends to abuse static init
block - even in dofn api - which easily lead to issues when we will want to
support more than a main (thinking to OSGi for instance).

So:

1. Why not using a standard programming model not cinit based? (Perf are
not a valid answer indeed)
2. GenId should probably be deprecated and considered a bad practise

This looks like a detail but for beam 3 we should ensure we drop the legacy
bringing bad practises in our user code.

Le 10 avr. 2018 20:15, "Ben Chambers" <bchamb...@apache.org> a écrit :

> I believe it doesn't need to be stable across refactoring, only across all
> workers executing a specific version of the code. Specifically, it is used
> as follows:
>
> 1. Create a pipeline on the user's machine. It walks the stack until the
> static initializer block, which provides an ID.
> 2. Send the pipeline to many worker machines.
> 3. Each worker machine walks the stack until the static initializer block
> (on the same version of the code), receiving the same ID.
>
> This ensures that the tupletag is the same on all the workers, as well as
> on the user's machine, which is critical since it used as an identifier
> across these machines.
>
> Assigning a UUID would work if all of the machines agreed on the same
> tuple ID, which could be accomplished with serialization. Serialization,
> however, doesn't work well with static initializers, since those will have
> been called to initialize the class at load time.
>
> On Tue, Apr 10, 2018 at 10:27 AM Romain Manni-Bucau <rmannibu...@gmail.com>
> wrote:
>
>> Well issue is more about all the existing tests currently.
>>
>> Out of curiosity: how walking the stack is stable since the stack can
>> change? Stop condition is the static block of a class which can use method
>> so refactoring and therefore is not stable. Should it be deprecated?
>>
>>
>> Le 10 avr. 2018 19:17, "Robert Bradshaw" <rober...@google.com> a écrit :
>>
>> If it's too slow perhaps you could use the constructor where you pass an
>> explicit id (though in my experience walking the stack isn't that slow).
>>
>> On Tue, Apr 10, 2018 at 10:09 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Oops cross post sorry.
>>>
>>> Issue i hit on this thread is it is used a lot in tests abd it slows
>>> down tests for nothing like with generatesequence ones
>>>
>>> Le 10 avr. 2018 19:00, "Romain Manni-Bucau" <rmannibu...@gmail.com> a
>>> écrit :
>>>
>>>>
>>>>
>>>> Le 10 avr. 2018 18:40, "Robert Bradshaw" <rober...@google.com> a
>>>> écrit :
>>>>
>>>> These values should be, inasmuch as possible, stable across VMs. How
>>>> slow is slow? Doesn't this happen only once per VM startup?
>>>>
>>>>
>>>> Once per jvm and idea launches a jvm per test and the daemon does save
>>>> enough time, you still go through the whole project and check all upstream
>>>> deps it seems.
>>>>
>>>> It is <1s with maven vs 5-6s with gradle.
>>>>
>>>>
>>>> On Tue, Apr 10, 2018 at 9:33 AM Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> does org.apache.beam.sdk.values.TupleTag#genId need to get the
>>>>> stacktrace or can we use any id generator (like
>>>>> UUID.random().toString())? Using traces is quite slow under load and
>>>>> environments where the root stack is not just the "next" level so
>>>>> skipping it would be nice.
>>>>>
>>>>> Romain Manni-Bucau
>>>>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>>>>
>>>>
>>>>
>>

Reply via email to