On Tue, Apr 10, 2018 at 12:10 PM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> This is interesting cause it leads to "why do the workers need to do it
> again instead of reusing the computed one?". Technically the answer is
> trivial but in terms of design I think beam tends to abuse static init
> block - even in dofn api - which easily lead to issues when we will want to
> support more than a main (thinking to OSGi for instance).
>
> So:
>
> 1. Why not using a standard programming model not cinit based? (Perf are
> not a valid answer indeed)
>

The Java language (as far as I know) doesn't have the ability to prohibit
assigning static values (such as TupleTags) as static members. We can,
however, detect this (which is what the current code does). It doesn't seem
to me that code like

public class MyDoFn {
    public static final TupleTag<String> SOME_OUTPUT_TAG = new TupleTag<>();
    ...
}

is "bad practice," especially as this tag will need to be referenced in
multiple places.


> 2. GenId should probably be deprecated and considered a bad practise
>

Is the proposal that we require the user to manually provide unique
identifiers everywhere? Or for static case like above? (Note that
accidentally re-using identifiers can lead to subtle incorrect pipeline
results.)

This looks like a detail but for beam 3 we should ensure we drop the legacy
> bringing bad practises in our user code.
>
> Le 10 avr. 2018 20:15, "Ben Chambers" <bchamb...@apache.org> a écrit :
>
>> I believe it doesn't need to be stable across refactoring, only across
>> all workers executing a specific version of the code. Specifically, it is
>> used as follows:
>>
>> 1. Create a pipeline on the user's machine. It walks the stack until the
>> static initializer block, which provides an ID.
>> 2. Send the pipeline to many worker machines.
>> 3. Each worker machine walks the stack until the static initializer block
>> (on the same version of the code), receiving the same ID.
>>
>> This ensures that the tupletag is the same on all the workers, as well as
>> on the user's machine, which is critical since it used as an identifier
>> across these machines.
>>
>> Assigning a UUID would work if all of the machines agreed on the same
>> tuple ID, which could be accomplished with serialization. Serialization,
>> however, doesn't work well with static initializers, since those will have
>> been called to initialize the class at load time.
>>
>> On Tue, Apr 10, 2018 at 10:27 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Well issue is more about all the existing tests currently.
>>>
>>> Out of curiosity: how walking the stack is stable since the stack can
>>> change? Stop condition is the static block of a class which can use method
>>> so refactoring and therefore is not stable. Should it be deprecated?
>>>
>>>
>>> Le 10 avr. 2018 19:17, "Robert Bradshaw" <rober...@google.com> a écrit :
>>>
>>> If it's too slow perhaps you could use the constructor where you pass an
>>> explicit id (though in my experience walking the stack isn't that slow).
>>>
>>> On Tue, Apr 10, 2018 at 10:09 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>> Oops cross post sorry.
>>>>
>>>> Issue i hit on this thread is it is used a lot in tests abd it slows
>>>> down tests for nothing like with generatesequence ones
>>>>
>>>> Le 10 avr. 2018 19:00, "Romain Manni-Bucau" <rmannibu...@gmail.com> a
>>>> écrit :
>>>>
>>>>>
>>>>>
>>>>> Le 10 avr. 2018 18:40, "Robert Bradshaw" <rober...@google.com> a
>>>>> écrit :
>>>>>
>>>>> These values should be, inasmuch as possible, stable across VMs. How
>>>>> slow is slow? Doesn't this happen only once per VM startup?
>>>>>
>>>>>
>>>>> Once per jvm and idea launches a jvm per test and the daemon does save
>>>>> enough time, you still go through the whole project and check all upstream
>>>>> deps it seems.
>>>>>
>>>>> It is <1s with maven vs 5-6s with gradle.
>>>>>
>>>>>
>>>>> On Tue, Apr 10, 2018 at 9:33 AM Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> does org.apache.beam.sdk.values.TupleTag#genId need to get the
>>>>>> stacktrace or can we use any id generator (like
>>>>>> UUID.random().toString())? Using traces is quite slow under load and
>>>>>> environments where the root stack is not just the "next" level so
>>>>>> skipping it would be nice.
>>>>>>
>>>>>> Romain Manni-Bucau
>>>>>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>>>>>
>>>>>
>>>>>
>>>

Reply via email to