+1 for on by default and ability to turn it off for entire application.

- Tushar.


On Thu, Oct 13, 2016 at 11:00 AM, Pradeep A. Dalvi <p...@apache.org> wrote:
> +1 for ON by default
> +1 for disabling it for all output ports
>
> With the kind of issues we have observed being faced by developers in the
> past, I strongly believe this check should be ON by default.
> However at the same time I feel, it shall be one-time check, mostly in
> Development phase and before going into Production. Having said that, if
> disabling it at application level i.e. for all operators and their
> respective output ports would it make implementation simpler, then that can
> be targeted first. Thoughts?
>
> --prad
>
> On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozov <v.ro...@datatorrent.com> wrote:
>
>> I run jmh test and check takes 1ns on my MacBook Pro and on the lab
>> machine. This corresponds to 3% degradation at 30 million events/second. I
>> think we can move forward with the check ON by default. Do we need an
>> ability to turn OFF check for a specific operator and/or port? My thought
>> is that such ability is not necessary and it should be OK to disable check
>> for all output ports in an application.
>>
>> Vlad
>>
>>
>> On 10/12/16 11:56, Amol Kekre wrote:
>>
>>> In case there turns out to be a penalty, we can introduce a "check for
>>> thread affinity" mode that triggers this check. My initial thought is to
>>> make this check ON by default. We should wait till benchmarks are
>>> available
>>> before discussing adding this check.
>>>
>>> Thks
>>> Amol
>>>
>>>
>>> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare <san...@datatorrent.com>
>>> wrote:
>>>
>>> A JIRA has been created for adding this thread affinity check
>>>> https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
>>>> enhancement in a branch
>>>> https://github.com/sanjaypujare/apex-core/tree/malhar-510.
>>>> thread_affinity
>>>> and I have been benchmarking the performance with this change. I will be
>>>> publishing the results in the above JIRA where we can discuss them and
>>>> hopefully agree on merging this change.
>>>>
>>>> On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare <san...@datatorrent.com>
>>>> wrote:
>>>>
>>>> You are right, I was subconsciously thinking about the THREAD_LOCAL case
>>>>> with a single container and a simple DAG and in that case Vlad’s
>>>>>
>>>> assumption
>>>>
>>>>> might not be valid but may be it is.
>>>>>
>>>>> On 8/11/16, 11:47 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote:
>>>>>
>>>>>      If I understand Vlad correctly, what he is saying is that each
>>>>>
>>>> operator
>>>>
>>>>>      saves currentThread in
>>>>>      its own setup() and checks it in its own output methods. The
>>>>> threads
>>>>>
>>>> in
>>>>
>>>>>      different operators are
>>>>>      running potentially on different nodes and/or processes and there
>>>>>
>>>> will
>>>>
>>>>> be
>>>>>      no connection between them.
>>>>>
>>>>>      Ram
>>>>>
>>>>>      On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
>>>>> san...@datatorrent.com>
>>>>>      wrote:
>>>>>
>>>>>      > Name check is expensive, agreed, but there isn’t anything else
>>>>> currently.
>>>>>      > Ideally the stram engine (considering that it is an engine
>>>>>
>>>> providing
>>>>
>>>>>      > resources like threads etc) should use a ThreadFactory or a
>>>>> ThreadGroup to
>>>>>      > create operator threads so identification and adding
>>>>> functionality
>>>>>
>>>> is
>>>>
>>>>>      > easier.
>>>>>      >
>>>>>      > The idea of checking for the same thread between setup() and
>>>>> emit()
>>>>> won’t
>>>>>      > work because the emit() check will have to be in the Sink
>>>>> hierarchy
>>>>> and
>>>>>      > AFAIK a Sink object doesn’t have access to the corresponding
>>>>> operator,
>>>>>      > right? Another more fundamental problem probably is that these
>>>>> threads
>>>>>      > don’t have to match. The emit() for any operator (or rather a
>>>>> Sink
>>>>> related
>>>>>      > to an operator) is ultimately triggered by an emitTuple() on the
>>>>> topmost
>>>>>      > input operator in that path which happens in that input
>>>>> operator’s
>>>>> thread
>>>>>      > which doesn’t have to match the thread calling setup() in the
>>>>> downstream
>>>>>      > operators, right?
>>>>>      >
>>>>>      >
>>>>>      > On 8/11/16, 10:59 AM, "Vlad Rozov" <v.ro...@datatorrent.com>
>>>>>
>>>> wrote:
>>>>
>>>>>      >
>>>>>      >     Name verification is too expensive, it will be sufficient to
>>>>> store
>>>>>      >     currentThread during setup() and verify that it is the same
>>>>> during
>>>>>      > emit.
>>>>>      >     Checks should be supported not only for DefaultOutputPort, so
>>>>>
>>>> we
>>>>
>>>>> may
>>>>>      >     have it implemented in various Sinks.
>>>>>      >
>>>>>      >     Vlad
>>>>>      >
>>>>>      >     On 8/11/16 10:21, Sanjay Pujare wrote:
>>>>>      >     > Thinking more about this – all of the “operator” threads
>>>>> are
>>>>> created
>>>>>      > by the Stram engine with appropriate names. So we can put checks
>>>>> in
>>>>> the
>>>>>      > DefaultOutputPort.emit() or in the various implementations of
>>>>> Sink.put()
>>>>>      > that the current-thread is one created by the Stram engine (by
>>>>> verifying
>>>>>      > the name).
>>>>>      >     >
>>>>>      >     > We can even use a special Thread object for operator
>>>>> threads
>>>>> so the
>>>>>      > above detection is easier.
>>>>>      >     >
>>>>>      >     >
>>>>>      >     >
>>>>>      >     > On 8/10/16, 6:11 PM, "Amol Kekre" <a...@datatorrent.com>
>>>>> wrote:
>>>>>      >     >
>>>>>      >     >      +1 on debug proposal. Even if tuples lands up within
>>>>> the
>>>>>      > window, it breaks
>>>>>      >     >      all guarantees. A rerun (after restart from a
>>>>>
>>>> checkpoint)
>>>>
>>>>> can
>>>>>      > have tuples
>>>>>      >     >      in different windows from this thread. A separate
>>>>> thread
>>>>> simply
>>>>>      > exposes
>>>>>      >     >      users to unwarranted risks.
>>>>>      >     >
>>>>>      >     >      Thks
>>>>>      >     >      Amol
>>>>>      >     >
>>>>>      >     >
>>>>>      >     >      On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov <
>>>>>      > v.ro...@datatorrent.com> wrote:
>>>>>      >     >
>>>>>      >     >      > Tuples emitted between end and begin windows is only
>>>>> one of
>>>>>      > possible
>>>>>      >     >      > behaviors that emitting tuples on a separate from
>>>>> the
>>>>>      > operator thread may
>>>>>      >     >      > introduce. It will be good to have both checks in
>>>>>
>>>> place
>>>>
>>>>> at
>>>>>      > run-time and if
>>>>>      >     >      > checking for the operator thread for every emitted
>>>>> tuple is
>>>>>      > too expensive,
>>>>>      >     >      > we may have it enabled only in DEBUG or mode with
>>>>> more
>>>>> checks
>>>>>      > in place.
>>>>>      >     >      >
>>>>>      >     >      > Vlad
>>>>>      >     >      >
>>>>>      >     >      >
>>>>>      >     >      > Sanjay just reminded me of my typo -> I meant
>>>>> between
>>>>>      > end_window and
>>>>>      >     >      >> start_window :)
>>>>>      >     >      >>
>>>>>      >     >      >> Thks
>>>>>      >     >      >> Amol
>>>>>      >     >      >>
>>>>>      >     >      >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare <
>>>>>      > san...@datatorrent.com>
>>>>>      >     >      >> wrote:
>>>>>      >     >      >>
>>>>>      >     >      >> If the goal is to do this validation through static
>>>>> analysis
>>>>>      > of operator
>>>>>      >     >      >>> code, I guess it is possible but is going to be
>>>>>      > non-trivial. And there
>>>>>      >     >      >>> could be false positives and false negatives.
>>>>>      >     >      >>>
>>>>>      >     >      >>> Also I suppose this discussion applies to
>>>>> processor
>>>>>      > operators (those
>>>>>      >     >      >>> having both in and out ports) so Ram’s example of
>>>>>      > JdbcPollInputOperator
>>>>>      >     >      >>> may
>>>>>      >     >      >>> not be applicable here?
>>>>>      >     >      >>>
>>>>>      >     >      >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" <
>>>>>      > ashwinchand...@gmail.com>
>>>>>      >     >      >>> wrote:
>>>>>      >     >      >>>
>>>>>      >     >      >>>      In a separate thread I mean.
>>>>>      >     >      >>>
>>>>>      >     >      >>>      Regards,
>>>>>      >     >      >>>      Ashwin.
>>>>>      >     >      >>>
>>>>>      >     >      >>>      On Wed, Aug 10, 2016 at 2:01 PM, Ashwin
>>>>> Chandra
>>>>> Putta <
>>>>>      >     >      >>>      ashwinchand...@gmail.com> wrote:
>>>>>      >     >      >>>
>>>>>      >     >      >>>      > + dev@apex.apache.org
>>>>>      >     >      >>>      > - us...@apex.apache.org
>>>>>      >     >      >>>      >
>>>>>      >     >      >>>      > This is one of those best practices that we
>>>>> learn by
>>>>>      > experience
>>>>>      >     >      >>> during
>>>>>      >     >      >>>      > operator development. It will save a lot of
>>>>> time
>>>>>      > during operator
>>>>>      >     >      >>>      > development if we can catch and throw
>>>>> validation
>>>>>      > error when
>>>>>      >     >      >>> someone
>>>>>      >     >      >>> emits
>>>>>      >     >      >>>      > tuples in a non separate thread.
>>>>>      >     >      >>>      >
>>>>>      >     >      >>>      > Regards,
>>>>>      >     >      >>>      > Ashwin
>>>>>      >     >      >>>      >
>>>>>      >     >      >>>      > On Wed, Aug 10, 2016 at 1:57 PM, Munagala
>>>>> Ramanath <
>>>>>      >     >      >>> r...@datatorrent.com>
>>>>>      >     >      >>>      > wrote:
>>>>>      >     >      >>>      >
>>>>>      >     >      >>>      >> For cases where use of a different thread
>>>>> is
>>>>>      > needed, it can write
>>>>>      >     >      >>> tuples
>>>>>      >     >      >>>      >> to a queue from where the operator thread
>>>>> pulls
>>>>>      > them --
>>>>>      >     >      >>>      >> JdbcPollInputOperator in Malhar has an
>>>>> example.
>>>>>      >     >      >>>      >>
>>>>>      >     >      >>>      >> Ram
>>>>>      >     >      >>>      >>
>>>>>      >     >      >>>      >> On Wed, Aug 10, 2016 at 1:50 PM,
>>>>> hsy...@gmail.com <
>>>>>      >     >      >>> hsy...@gmail.com
>>>>>      >     >      >>>      >> wrote:
>>>>>      >     >      >>>      >>
>>>>>      >     >      >>>      >>> Hey Vlad,
>>>>>      >     >      >>>      >>>
>>>>>      >     >      >>>      >>> Thanks for bringing this up. Is there an
>>>>> easy way
>>>>>      > to detect
>>>>>      >     >      >>> unexpected
>>>>>      >     >      >>>      >>> use of emit method without hurt the
>>>>> performance.
>>>>>      > Or at least if
>>>>>      >     >      >>> we
>>>>>      >     >      >>> can
>>>>>      >     >      >>>      >>> detect this in debug mode.
>>>>>      >     >      >>>      >>>
>>>>>      >     >      >>>      >>> Regards,
>>>>>      >     >      >>>      >>> Siyuan
>>>>>      >     >      >>>      >>>
>>>>>      >     >      >>>      >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad
>>>>>
>>>> Rozov
>>>>
>>>>> <
>>>>>      >     >      >>> v.ro...@datatorrent.com>
>>>>>      >     >      >>>      >>> wrote:
>>>>>      >     >      >>>      >>>
>>>>>      >     >      >>>      >>>> The short answer is no, creating worker
>>>>> thread to
>>>>>      > emit tuples
>>>>>      >     >      >>> is
>>>>>      >     >      >>> not
>>>>>      >     >      >>>      >>>> supported by Apex and will lead to an
>>>>> undefined
>>>>>      > behavior.
>>>>>      >     >      >>> Operators in Apex
>>>>>      >     >      >>>      >>>> have strong thread affinity and all
>>>>> interaction
>>>>>      > with the
>>>>>      >     >      >>> platform
>>>>>      >     >      >>> must
>>>>>      >     >      >>>      >>>> happen on the operator thread.
>>>>>      >     >      >>>      >>>>
>>>>>      >     >      >>>      >>>> Vlad
>>>>>      >     >      >>>      >>>>
>>>>>      >     >      >>>      >>>
>>>>>      >     >      >>>      >>>
>>>>>      >     >      >>>      >>
>>>>>      >     >      >>>      >
>>>>>      >     >      >>>      >
>>>>>      >     >      >>>      > --
>>>>>      >     >      >>>      >
>>>>>      >     >      >>>      > Regards,
>>>>>      >     >      >>>      > Ashwin.
>>>>>      >     >      >>>      >
>>>>>      >     >      >>>
>>>>>      >     >      >>>
>>>>>      >     >      >>>
>>>>>      >     >      >>>      --
>>>>>      >     >      >>>
>>>>>      >     >      >>>      Regards,
>>>>>      >     >      >>>      Ashwin.
>>>>>      >     >      >>>
>>>>>      >     >      >>>
>>>>>      >     >      >>>
>>>>>      >     >      >>>
>>>>>      >     >      >>>
>>>>>      >     >      >
>>>>>      >     >
>>>>>      >     >
>>>>>      >     >
>>>>>      >
>>>>>      >
>>>>>      >
>>>>>      >
>>>>>      >
>>>>>
>>>>>
>>>>>
>>>>>
>>

Reply via email to