I think it's best for the users to pick their own identifier for TupleTags.
As part of the snapshot/update proposal, we want to be able update
pipelines across refactorings. We tell people to supply a stable name for
transforms (in apply) specifically to make this doable (so we can match up
transforms between the old and new graphs). However if side inputs have
names that are unstable across refactoring, this becomes impossible to do
(impossible even for the user, since the user has no insight into these
auto-generated ids).

Reuven

On Wed, Apr 11, 2018 at 4:58 AM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

>
>
> Le 10 avr. 2018 22:59, "Robert Bradshaw" <rober...@google.com> a écrit :
>
> On Tue, Apr 10, 2018 at 1:49 PM Romain Manni-Bucau <rmannibu...@gmail.com>
> wrote:
>
>>
>> Le 10 avr. 2018 21:25, "Robert Bradshaw" <rober...@google.com> a écrit :
>>
>> On Tue, Apr 10, 2018 at 12:10 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> This is interesting cause it leads to "why do the workers need to do it
>>> again instead of reusing the computed one?". Technically the answer is
>>> trivial but in terms of design I think beam tends to abuse static init
>>> block - even in dofn api - which easily lead to issues when we will want to
>>> support more than a main (thinking to OSGi for instance).
>>>
>>> So:
>>>
>>> 1. Why not using a standard programming model not cinit based? (Perf are
>>> not a valid answer indeed)
>>>
>>
>> The Java language (as far as I know) doesn't have the ability to prohibit
>> assigning static values (such as TupleTags) as static members. We can,
>> however, detect this (which is what the current code does). It doesn't seem
>> to me that code like
>>
>> public class MyDoFn {
>>     public static final TupleTag<String> SOME_OUTPUT_TAG = new
>> TupleTag<>();
>>     ...
>> }
>>
>> is "bad practice," especially as this tag will need to be referenced in
>> multiple places.
>>
>>
>> It is as soon as you dont run in a flat classpath env. In flat cp it is
>> acceptable and dont have much side effects...but beam doesnt know where it
>> runs ;).
>>
>
> The problem is, people *will* write this.
>
>
>
> This is ok and if thegenid/default constructor is deprecated it is ok.
>
> Note however the dofn api encourages this pattern for state and timers
> instead of passing them as parameters or field (not static) injections
> which would be saner.
>
>
>
>> 2. GenId should probably be deprecated and considered a bad practise
>>>
>>
>> Is the proposal that we require the user to manually provide unique
>> identifiers everywhere? Or for static case like above? (Note that
>> accidentally re-using identifiers can lead to subtle incorrect pipeline
>> results.)
>>
>>
>> Yep.
>>
>
> Yep to which?
>
>
>
> Force a value. Static or not is a user choice we should just advice to not
> be static.
>
>
>
>> And ensure we can serialize a tupletag with an already uuid-generated id
>> for instance.
>>
>
> Yes, we already do this.
>
>
>>
>>
>> This looks like a detail but for beam 3 we should ensure we drop the
>>> legacy bringing bad practises in our user code.
>>>
>>> Le 10 avr. 2018 20:15, "Ben Chambers" <bchamb...@apache.org> a écrit :
>>>
>>>> I believe it doesn't need to be stable across refactoring, only across
>>>> all workers executing a specific version of the code. Specifically, it is
>>>> used as follows:
>>>>
>>>> 1. Create a pipeline on the user's machine. It walks the stack until
>>>> the static initializer block, which provides an ID.
>>>> 2. Send the pipeline to many worker machines.
>>>> 3. Each worker machine walks the stack until the static initializer
>>>> block (on the same version of the code), receiving the same ID.
>>>>
>>>> This ensures that the tupletag is the same on all the workers, as well
>>>> as on the user's machine, which is critical since it used as an identifier
>>>> across these machines.
>>>>
>>>> Assigning a UUID would work if all of the machines agreed on the same
>>>> tuple ID, which could be accomplished with serialization. Serialization,
>>>> however, doesn't work well with static initializers, since those will have
>>>> been called to initialize the class at load time.
>>>>
>>>> On Tue, Apr 10, 2018 at 10:27 AM Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> Well issue is more about all the existing tests currently.
>>>>>
>>>>> Out of curiosity: how walking the stack is stable since the stack can
>>>>> change? Stop condition is the static block of a class which can use method
>>>>> so refactoring and therefore is not stable. Should it be deprecated?
>>>>>
>>>>>
>>>>> Le 10 avr. 2018 19:17, "Robert Bradshaw" <rober...@google.com> a
>>>>> écrit :
>>>>>
>>>>> If it's too slow perhaps you could use the constructor where you pass
>>>>> an explicit id (though in my experience walking the stack isn't that 
>>>>> slow).
>>>>>
>>>>> On Tue, Apr 10, 2018 at 10:09 AM Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>>> Oops cross post sorry.
>>>>>>
>>>>>> Issue i hit on this thread is it is used a lot in tests abd it slows
>>>>>> down tests for nothing like with generatesequence ones
>>>>>>
>>>>>> Le 10 avr. 2018 19:00, "Romain Manni-Bucau" <rmannibu...@gmail.com>
>>>>>> a écrit :
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le 10 avr. 2018 18:40, "Robert Bradshaw" <rober...@google.com> a
>>>>>>> écrit :
>>>>>>>
>>>>>>> These values should be, inasmuch as possible, stable across VMs. How
>>>>>>> slow is slow? Doesn't this happen only once per VM startup?
>>>>>>>
>>>>>>>
>>>>>>> Once per jvm and idea launches a jvm per test and the daemon does
>>>>>>> save enough time, you still go through the whole project and check all
>>>>>>> upstream deps it seems.
>>>>>>>
>>>>>>> It is <1s with maven vs 5-6s with gradle.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 10, 2018 at 9:33 AM Romain Manni-Bucau <
>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> does org.apache.beam.sdk.values.TupleTag#genId need to get the
>>>>>>>> stacktrace or can we use any id generator (like
>>>>>>>> UUID.random().toString())? Using traces is quite slow under load and
>>>>>>>> environments where the root stack is not just the "next" level so
>>>>>>>> skipping it would be nice.
>>>>>>>>
>>>>>>>> Romain Manni-Bucau
>>>>>>>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>
>

Reply via email to