On Tue, Apr 10, 2018 at 1:49 PM Romain Manni-Bucau <rmannibu...@gmail.com> wrote:
> > Le 10 avr. 2018 21:25, "Robert Bradshaw" <rober...@google.com> a écrit : > > On Tue, Apr 10, 2018 at 12:10 PM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> This is interesting cause it leads to "why do the workers need to do it >> again instead of reusing the computed one?". Technically the answer is >> trivial but in terms of design I think beam tends to abuse static init >> block - even in dofn api - which easily lead to issues when we will want to >> support more than a main (thinking to OSGi for instance). >> >> So: >> >> 1. Why not using a standard programming model not cinit based? (Perf are >> not a valid answer indeed) >> > > The Java language (as far as I know) doesn't have the ability to prohibit > assigning static values (such as TupleTags) as static members. We can, > however, detect this (which is what the current code does). It doesn't seem > to me that code like > > public class MyDoFn { > public static final TupleTag<String> SOME_OUTPUT_TAG = new > TupleTag<>(); > ... > } > > is "bad practice," especially as this tag will need to be referenced in > multiple places. > > > It is as soon as you dont run in a flat classpath env. In flat cp it is > acceptable and dont have much side effects...but beam doesnt know where it > runs ;). > The problem is, people *will* write this. > 2. GenId should probably be deprecated and considered a bad practise >> > > Is the proposal that we require the user to manually provide unique > identifiers everywhere? Or for static case like above? (Note that > accidentally re-using identifiers can lead to subtle incorrect pipeline > results.) > > > Yep. > Yep to which? > And ensure we can serialize a tupletag with an already uuid-generated id > for instance. > Yes, we already do this. > > > This looks like a detail but for beam 3 we should ensure we drop the >> legacy bringing bad practises in our user code. >> >> Le 10 avr. 2018 20:15, "Ben Chambers" <bchamb...@apache.org> a écrit : >> >>> I believe it doesn't need to be stable across refactoring, only across >>> all workers executing a specific version of the code. Specifically, it is >>> used as follows: >>> >>> 1. Create a pipeline on the user's machine. It walks the stack until the >>> static initializer block, which provides an ID. >>> 2. Send the pipeline to many worker machines. >>> 3. Each worker machine walks the stack until the static initializer >>> block (on the same version of the code), receiving the same ID. >>> >>> This ensures that the tupletag is the same on all the workers, as well >>> as on the user's machine, which is critical since it used as an identifier >>> across these machines. >>> >>> Assigning a UUID would work if all of the machines agreed on the same >>> tuple ID, which could be accomplished with serialization. Serialization, >>> however, doesn't work well with static initializers, since those will have >>> been called to initialize the class at load time. >>> >>> On Tue, Apr 10, 2018 at 10:27 AM Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>>> Well issue is more about all the existing tests currently. >>>> >>>> Out of curiosity: how walking the stack is stable since the stack can >>>> change? Stop condition is the static block of a class which can use method >>>> so refactoring and therefore is not stable. Should it be deprecated? >>>> >>>> >>>> Le 10 avr. 2018 19:17, "Robert Bradshaw" <rober...@google.com> a >>>> écrit : >>>> >>>> If it's too slow perhaps you could use the constructor where you pass >>>> an explicit id (though in my experience walking the stack isn't that slow). >>>> >>>> On Tue, Apr 10, 2018 at 10:09 AM Romain Manni-Bucau < >>>> rmannibu...@gmail.com> wrote: >>>> >>>>> Oops cross post sorry. >>>>> >>>>> Issue i hit on this thread is it is used a lot in tests abd it slows >>>>> down tests for nothing like with generatesequence ones >>>>> >>>>> Le 10 avr. 2018 19:00, "Romain Manni-Bucau" <rmannibu...@gmail.com> a >>>>> écrit : >>>>> >>>>>> >>>>>> >>>>>> Le 10 avr. 2018 18:40, "Robert Bradshaw" <rober...@google.com> a >>>>>> écrit : >>>>>> >>>>>> These values should be, inasmuch as possible, stable across VMs. How >>>>>> slow is slow? Doesn't this happen only once per VM startup? >>>>>> >>>>>> >>>>>> Once per jvm and idea launches a jvm per test and the daemon does >>>>>> save enough time, you still go through the whole project and check all >>>>>> upstream deps it seems. >>>>>> >>>>>> It is <1s with maven vs 5-6s with gradle. >>>>>> >>>>>> >>>>>> On Tue, Apr 10, 2018 at 9:33 AM Romain Manni-Bucau < >>>>>> rmannibu...@gmail.com> wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> does org.apache.beam.sdk.values.TupleTag#genId need to get the >>>>>>> stacktrace or can we use any id generator (like >>>>>>> UUID.random().toString())? Using traces is quite slow under load and >>>>>>> environments where the root stack is not just the "next" level so >>>>>>> skipping it would be nice. >>>>>>> >>>>>>> Romain Manni-Bucau >>>>>>> @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book >>>>>>> >>>>>> >>>>>> >>>> >