On Tue, Apr 10, 2018 at 1:49 PM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

>
> Le 10 avr. 2018 21:25, "Robert Bradshaw" <rober...@google.com> a écrit :
>
> On Tue, Apr 10, 2018 at 12:10 PM Romain Manni-Bucau <rmannibu...@gmail.com>
> wrote:
>
>> This is interesting cause it leads to "why do the workers need to do it
>> again instead of reusing the computed one?". Technically the answer is
>> trivial but in terms of design I think beam tends to abuse static init
>> block - even in dofn api - which easily lead to issues when we will want to
>> support more than a main (thinking to OSGi for instance).
>>
>> So:
>>
>> 1. Why not using a standard programming model not cinit based? (Perf are
>> not a valid answer indeed)
>>
>
> The Java language (as far as I know) doesn't have the ability to prohibit
> assigning static values (such as TupleTags) as static members. We can,
> however, detect this (which is what the current code does). It doesn't seem
> to me that code like
>
> public class MyDoFn {
>     public static final TupleTag<String> SOME_OUTPUT_TAG = new
> TupleTag<>();
>     ...
> }
>
> is "bad practice," especially as this tag will need to be referenced in
> multiple places.
>
>
> It is as soon as you dont run in a flat classpath env. In flat cp it is
> acceptable and dont have much side effects...but beam doesnt know where it
> runs ;).
>

The problem is, people *will* write this.


> 2. GenId should probably be deprecated and considered a bad practise
>>
>
> Is the proposal that we require the user to manually provide unique
> identifiers everywhere? Or for static case like above? (Note that
> accidentally re-using identifiers can lead to subtle incorrect pipeline
> results.)
>
>
> Yep.
>

Yep to which?


> And ensure we can serialize a tupletag with an already uuid-generated id
> for instance.
>

Yes, we already do this.


>
>
> This looks like a detail but for beam 3 we should ensure we drop the
>> legacy bringing bad practises in our user code.
>>
>> Le 10 avr. 2018 20:15, "Ben Chambers" <bchamb...@apache.org> a écrit :
>>
>>> I believe it doesn't need to be stable across refactoring, only across
>>> all workers executing a specific version of the code. Specifically, it is
>>> used as follows:
>>>
>>> 1. Create a pipeline on the user's machine. It walks the stack until the
>>> static initializer block, which provides an ID.
>>> 2. Send the pipeline to many worker machines.
>>> 3. Each worker machine walks the stack until the static initializer
>>> block (on the same version of the code), receiving the same ID.
>>>
>>> This ensures that the tupletag is the same on all the workers, as well
>>> as on the user's machine, which is critical since it used as an identifier
>>> across these machines.
>>>
>>> Assigning a UUID would work if all of the machines agreed on the same
>>> tuple ID, which could be accomplished with serialization. Serialization,
>>> however, doesn't work well with static initializers, since those will have
>>> been called to initialize the class at load time.
>>>
>>> On Tue, Apr 10, 2018 at 10:27 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>> Well issue is more about all the existing tests currently.
>>>>
>>>> Out of curiosity: how walking the stack is stable since the stack can
>>>> change? Stop condition is the static block of a class which can use method
>>>> so refactoring and therefore is not stable. Should it be deprecated?
>>>>
>>>>
>>>> Le 10 avr. 2018 19:17, "Robert Bradshaw" <rober...@google.com> a
>>>> écrit :
>>>>
>>>> If it's too slow perhaps you could use the constructor where you pass
>>>> an explicit id (though in my experience walking the stack isn't that slow).
>>>>
>>>> On Tue, Apr 10, 2018 at 10:09 AM Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> Oops cross post sorry.
>>>>>
>>>>> Issue i hit on this thread is it is used a lot in tests abd it slows
>>>>> down tests for nothing like with generatesequence ones
>>>>>
>>>>> Le 10 avr. 2018 19:00, "Romain Manni-Bucau" <rmannibu...@gmail.com> a
>>>>> écrit :
>>>>>
>>>>>>
>>>>>>
>>>>>> Le 10 avr. 2018 18:40, "Robert Bradshaw" <rober...@google.com> a
>>>>>> écrit :
>>>>>>
>>>>>> These values should be, inasmuch as possible, stable across VMs. How
>>>>>> slow is slow? Doesn't this happen only once per VM startup?
>>>>>>
>>>>>>
>>>>>> Once per jvm and idea launches a jvm per test and the daemon does
>>>>>> save enough time, you still go through the whole project and check all
>>>>>> upstream deps it seems.
>>>>>>
>>>>>> It is <1s with maven vs 5-6s with gradle.
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 10, 2018 at 9:33 AM Romain Manni-Bucau <
>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> does org.apache.beam.sdk.values.TupleTag#genId need to get the
>>>>>>> stacktrace or can we use any id generator (like
>>>>>>> UUID.random().toString())? Using traces is quite slow under load and
>>>>>>> environments where the root stack is not just the "next" level so
>>>>>>> skipping it would be nice.
>>>>>>>
>>>>>>> Romain Manni-Bucau
>>>>>>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>>>>>>
>>>>>>
>>>>>>
>>>>
>

Reply via email to