+1 for on by default and ability to turn it off for entire application. - Tushar.
On Thu, Oct 13, 2016 at 11:00 AM, Pradeep A. Dalvi <[email protected]> wrote: > +1 for ON by default > +1 for disabling it for all output ports > > With the kind of issues we have observed being faced by developers in the > past, I strongly believe this check should be ON by default. > However at the same time I feel, it shall be one-time check, mostly in > Development phase and before going into Production. Having said that, if > disabling it at application level i.e. for all operators and their > respective output ports would it make implementation simpler, then that can > be targeted first. Thoughts? > > --prad > > On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozov <[email protected]> wrote: > >> I run jmh test and check takes 1ns on my MacBook Pro and on the lab >> machine. This corresponds to 3% degradation at 30 million events/second. I >> think we can move forward with the check ON by default. Do we need an >> ability to turn OFF check for a specific operator and/or port? My thought >> is that such ability is not necessary and it should be OK to disable check >> for all output ports in an application. >> >> Vlad >> >> >> On 10/12/16 11:56, Amol Kekre wrote: >> >>> In case there turns out to be a penalty, we can introduce a "check for >>> thread affinity" mode that triggers this check. My initial thought is to >>> make this check ON by default. We should wait till benchmarks are >>> available >>> before discussing adding this check. >>> >>> Thks >>> Amol >>> >>> >>> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare <[email protected]> >>> wrote: >>> >>> A JIRA has been created for adding this thread affinity check >>>> https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this >>>> enhancement in a branch >>>> https://github.com/sanjaypujare/apex-core/tree/malhar-510. >>>> thread_affinity >>>> and I have been benchmarking the performance with this change. I will be >>>> publishing the results in the above JIRA where we can discuss them and >>>> hopefully agree on merging this change. >>>> >>>> On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare <[email protected]> >>>> wrote: >>>> >>>> You are right, I was subconsciously thinking about the THREAD_LOCAL case >>>>> with a single container and a simple DAG and in that case Vlad’s >>>>> >>>> assumption >>>> >>>>> might not be valid but may be it is. >>>>> >>>>> On 8/11/16, 11:47 AM, "Munagala Ramanath" <[email protected]> wrote: >>>>> >>>>> If I understand Vlad correctly, what he is saying is that each >>>>> >>>> operator >>>> >>>>> saves currentThread in >>>>> its own setup() and checks it in its own output methods. The >>>>> threads >>>>> >>>> in >>>> >>>>> different operators are >>>>> running potentially on different nodes and/or processes and there >>>>> >>>> will >>>> >>>>> be >>>>> no connection between them. >>>>> >>>>> Ram >>>>> >>>>> On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare < >>>>> [email protected]> >>>>> wrote: >>>>> >>>>> > Name check is expensive, agreed, but there isn’t anything else >>>>> currently. >>>>> > Ideally the stram engine (considering that it is an engine >>>>> >>>> providing >>>> >>>>> > resources like threads etc) should use a ThreadFactory or a >>>>> ThreadGroup to >>>>> > create operator threads so identification and adding >>>>> functionality >>>>> >>>> is >>>> >>>>> > easier. >>>>> > >>>>> > The idea of checking for the same thread between setup() and >>>>> emit() >>>>> won’t >>>>> > work because the emit() check will have to be in the Sink >>>>> hierarchy >>>>> and >>>>> > AFAIK a Sink object doesn’t have access to the corresponding >>>>> operator, >>>>> > right? Another more fundamental problem probably is that these >>>>> threads >>>>> > don’t have to match. The emit() for any operator (or rather a >>>>> Sink >>>>> related >>>>> > to an operator) is ultimately triggered by an emitTuple() on the >>>>> topmost >>>>> > input operator in that path which happens in that input >>>>> operator’s >>>>> thread >>>>> > which doesn’t have to match the thread calling setup() in the >>>>> downstream >>>>> > operators, right? >>>>> > >>>>> > >>>>> > On 8/11/16, 10:59 AM, "Vlad Rozov" <[email protected]> >>>>> >>>> wrote: >>>> >>>>> > >>>>> > Name verification is too expensive, it will be sufficient to >>>>> store >>>>> > currentThread during setup() and verify that it is the same >>>>> during >>>>> > emit. >>>>> > Checks should be supported not only for DefaultOutputPort, so >>>>> >>>> we >>>> >>>>> may >>>>> > have it implemented in various Sinks. >>>>> > >>>>> > Vlad >>>>> > >>>>> > On 8/11/16 10:21, Sanjay Pujare wrote: >>>>> > > Thinking more about this – all of the “operator” threads >>>>> are >>>>> created >>>>> > by the Stram engine with appropriate names. So we can put checks >>>>> in >>>>> the >>>>> > DefaultOutputPort.emit() or in the various implementations of >>>>> Sink.put() >>>>> > that the current-thread is one created by the Stram engine (by >>>>> verifying >>>>> > the name). >>>>> > > >>>>> > > We can even use a special Thread object for operator >>>>> threads >>>>> so the >>>>> > above detection is easier. >>>>> > > >>>>> > > >>>>> > > >>>>> > > On 8/10/16, 6:11 PM, "Amol Kekre" <[email protected]> >>>>> wrote: >>>>> > > >>>>> > > +1 on debug proposal. Even if tuples lands up within >>>>> the >>>>> > window, it breaks >>>>> > > all guarantees. A rerun (after restart from a >>>>> >>>> checkpoint) >>>> >>>>> can >>>>> > have tuples >>>>> > > in different windows from this thread. A separate >>>>> thread >>>>> simply >>>>> > exposes >>>>> > > users to unwarranted risks. >>>>> > > >>>>> > > Thks >>>>> > > Amol >>>>> > > >>>>> > > >>>>> > > On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov < >>>>> > [email protected]> wrote: >>>>> > > >>>>> > > > Tuples emitted between end and begin windows is only >>>>> one of >>>>> > possible >>>>> > > > behaviors that emitting tuples on a separate from >>>>> the >>>>> > operator thread may >>>>> > > > introduce. It will be good to have both checks in >>>>> >>>> place >>>> >>>>> at >>>>> > run-time and if >>>>> > > > checking for the operator thread for every emitted >>>>> tuple is >>>>> > too expensive, >>>>> > > > we may have it enabled only in DEBUG or mode with >>>>> more >>>>> checks >>>>> > in place. >>>>> > > > >>>>> > > > Vlad >>>>> > > > >>>>> > > > >>>>> > > > Sanjay just reminded me of my typo -> I meant >>>>> between >>>>> > end_window and >>>>> > > >> start_window :) >>>>> > > >> >>>>> > > >> Thks >>>>> > > >> Amol >>>>> > > >> >>>>> > > >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare < >>>>> > [email protected]> >>>>> > > >> wrote: >>>>> > > >> >>>>> > > >> If the goal is to do this validation through static >>>>> analysis >>>>> > of operator >>>>> > > >>> code, I guess it is possible but is going to be >>>>> > non-trivial. And there >>>>> > > >>> could be false positives and false negatives. >>>>> > > >>> >>>>> > > >>> Also I suppose this discussion applies to >>>>> processor >>>>> > operators (those >>>>> > > >>> having both in and out ports) so Ram’s example of >>>>> > JdbcPollInputOperator >>>>> > > >>> may >>>>> > > >>> not be applicable here? >>>>> > > >>> >>>>> > > >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" < >>>>> > [email protected]> >>>>> > > >>> wrote: >>>>> > > >>> >>>>> > > >>> In a separate thread I mean. >>>>> > > >>> >>>>> > > >>> Regards, >>>>> > > >>> Ashwin. >>>>> > > >>> >>>>> > > >>> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin >>>>> Chandra >>>>> Putta < >>>>> > > >>> [email protected]> wrote: >>>>> > > >>> >>>>> > > >>> > + [email protected] >>>>> > > >>> > - [email protected] >>>>> > > >>> > >>>>> > > >>> > This is one of those best practices that we >>>>> learn by >>>>> > experience >>>>> > > >>> during >>>>> > > >>> > operator development. It will save a lot of >>>>> time >>>>> > during operator >>>>> > > >>> > development if we can catch and throw >>>>> validation >>>>> > error when >>>>> > > >>> someone >>>>> > > >>> emits >>>>> > > >>> > tuples in a non separate thread. >>>>> > > >>> > >>>>> > > >>> > Regards, >>>>> > > >>> > Ashwin >>>>> > > >>> > >>>>> > > >>> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala >>>>> Ramanath < >>>>> > > >>> [email protected]> >>>>> > > >>> > wrote: >>>>> > > >>> > >>>>> > > >>> >> For cases where use of a different thread >>>>> is >>>>> > needed, it can write >>>>> > > >>> tuples >>>>> > > >>> >> to a queue from where the operator thread >>>>> pulls >>>>> > them -- >>>>> > > >>> >> JdbcPollInputOperator in Malhar has an >>>>> example. >>>>> > > >>> >> >>>>> > > >>> >> Ram >>>>> > > >>> >> >>>>> > > >>> >> On Wed, Aug 10, 2016 at 1:50 PM, >>>>> [email protected] < >>>>> > > >>> [email protected] >>>>> > > >>> >> wrote: >>>>> > > >>> >> >>>>> > > >>> >>> Hey Vlad, >>>>> > > >>> >>> >>>>> > > >>> >>> Thanks for bringing this up. Is there an >>>>> easy way >>>>> > to detect >>>>> > > >>> unexpected >>>>> > > >>> >>> use of emit method without hurt the >>>>> performance. >>>>> > Or at least if >>>>> > > >>> we >>>>> > > >>> can >>>>> > > >>> >>> detect this in debug mode. >>>>> > > >>> >>> >>>>> > > >>> >>> Regards, >>>>> > > >>> >>> Siyuan >>>>> > > >>> >>> >>>>> > > >>> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad >>>>> >>>> Rozov >>>> >>>>> < >>>>> > > >>> [email protected]> >>>>> > > >>> >>> wrote: >>>>> > > >>> >>> >>>>> > > >>> >>>> The short answer is no, creating worker >>>>> thread to >>>>> > emit tuples >>>>> > > >>> is >>>>> > > >>> not >>>>> > > >>> >>>> supported by Apex and will lead to an >>>>> undefined >>>>> > behavior. >>>>> > > >>> Operators in Apex >>>>> > > >>> >>>> have strong thread affinity and all >>>>> interaction >>>>> > with the >>>>> > > >>> platform >>>>> > > >>> must >>>>> > > >>> >>>> happen on the operator thread. >>>>> > > >>> >>>> >>>>> > > >>> >>>> Vlad >>>>> > > >>> >>>> >>>>> > > >>> >>> >>>>> > > >>> >>> >>>>> > > >>> >> >>>>> > > >>> > >>>>> > > >>> > >>>>> > > >>> > -- >>>>> > > >>> > >>>>> > > >>> > Regards, >>>>> > > >>> > Ashwin. >>>>> > > >>> > >>>>> > > >>> >>>>> > > >>> >>>>> > > >>> >>>>> > > >>> -- >>>>> > > >>> >>>>> > > >>> Regards, >>>>> > > >>> Ashwin. >>>>> > > >>> >>>>> > > >>> >>>>> > > >>> >>>>> > > >>> >>>>> > > >>> >>>>> > > > >>>>> > > >>>>> > > >>>>> > > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> >>
