2018-02-18 18:00 GMT+01:00 Eugene Kirpichov <[email protected]>:

>
>
> On Sun, Feb 18, 2018, 2:06 AM Romain Manni-Bucau <[email protected]>
> wrote:
>
>>
>>
>> Le 18 févr. 2018 00:23, "Kenneth Knowles" <[email protected]> a écrit :
>>
>> On Sat, Feb 17, 2018 at 3:09 PM, Romain Manni-Bucau <
>> [email protected]> wrote:
>>>
>>> If you give an example of a high-level need (e.g. "I'm trying to write
>>> an IO for system $x and it requires the following initialization and the
>>> following cleanup logic and the following processing in between") I'll be
>>> better able to help you.
>>>
>>>
>>> Take a simple example of a transform requiring a connection. Using
>>> bundles is a perf killer since size is not controlled. Using teardown
>>> doesnt allow you to release the connection since it is a best effort thing.
>>> Not releasing the connection makes you pay a lot - aws ;) - or prevents you
>>> to launch other processings - concurrent limit.
>>>
>>
>> For this example @Teardown is an exact fit. If things die so badly that
>> @Teardown is not called then nothing else can be called to close the
>> connection either. What AWS service are you thinking of that stays open for
>> a long time when everything at the other end has died?
>>
>>
>> You assume connections are kind of stateless but some (proprietary)
>> protocols requires some closing exchanges which are not only "im leaving".
>>
>> For aws i was thinking about starting some services - machines - on the
>> fly in a pipeline startup and closing them at the end. If teardown is not
>> called you leak machines and money. You can say it can be done another
>> way...as the full pipeline ;).
>>
>> I dont want to be picky but if beam cant handle its components lifecycle
>> it can be used at scale for generic pipelines and if bound to some
>> particular IO.
>>
>> What does prevent to enforce teardown - ignoring the interstellar crash
>> case which cant be handled by any human system? Nothing technically. Why do
>> you push to not handle it? Is it due to some legacy code on dataflow or
>> something else?
>>
> Teardown *is* already documented and implemented this way (best-effort).
> So I'm not sure what kind of change you're asking for.
>

Remove "best effort" from the javadoc. If it is not call then it is a bug
and we are done :).


>
>
>> Also what does it mean for the users? Direct runner does it so if a user
>> udes the RI in test, he will get a different behavior in prod? Also dont
>> forget the user doesnt know what the IOs he composes use so this is so
>> impacting for the whole product than he must be handled IMHO.
>>
>> I understand the portability culture is new in big data world but it is
>> not a reason to ignore what people did for years and do it wrong before
>> doing right ;).
>>
>> My proposal is to list what can prevent to guarantee - in the normal IT
>> conditions - the execution of teardown. Then we see if we can handle it and
>> only if there is a technical reason we cant we make it
>> experimental/unsupported in the api. I know spark and flink can, any
>> unknown blocker for other runners?
>>
>> Technical note: even a kill should go through java shutdown hooks
>> otherwise your environment (beam enclosing software) is fully unhandled and
>> your overall system is uncontrolled. Only case where it is not true is when
>> the software is always owned by a vendor and never installed on customer
>> environment. In this case it belongd to the vendor to handle beam API and
>> not to beam to adjust its API for a vendor - otherwise all unsupported
>> features by one runner should be made optional right?
>>
>> All state is not about network, even in distributed systems so this is
>> key to have an explicit and defined lifecycle.
>>
>>
>> Kenn
>>
>>
>>

Reply via email to