Vlad,
I agree that the check should be ON by default. Ability to turn it off for
entire app is fine, per port is not needed.

Thks
Amol

On Wed, Oct 12, 2016 at 10:34 PM, Tushar Gosavi <tus...@datatorrent.com>
wrote:

> +1 for on by default and ability to turn it off for entire application.
>
> - Tushar.
>
>
> On Thu, Oct 13, 2016 at 11:00 AM, Pradeep A. Dalvi <p...@apache.org>
> wrote:
> > +1 for ON by default
> > +1 for disabling it for all output ports
> >
> > With the kind of issues we have observed being faced by developers in the
> > past, I strongly believe this check should be ON by default.
> > However at the same time I feel, it shall be one-time check, mostly in
> > Development phase and before going into Production. Having said that, if
> > disabling it at application level i.e. for all operators and their
> > respective output ports would it make implementation simpler, then that
> can
> > be targeted first. Thoughts?
> >
> > --prad
> >
> > On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozov <v.ro...@datatorrent.com>
> wrote:
> >
> >> I run jmh test and check takes 1ns on my MacBook Pro and on the lab
> >> machine. This corresponds to 3% degradation at 30 million
> events/second. I
> >> think we can move forward with the check ON by default. Do we need an
> >> ability to turn OFF check for a specific operator and/or port? My
> thought
> >> is that such ability is not necessary and it should be OK to disable
> check
> >> for all output ports in an application.
> >>
> >> Vlad
> >>
> >>
> >> On 10/12/16 11:56, Amol Kekre wrote:
> >>
> >>> In case there turns out to be a penalty, we can introduce a "check for
> >>> thread affinity" mode that triggers this check. My initial thought is
> to
> >>> make this check ON by default. We should wait till benchmarks are
> >>> available
> >>> before discussing adding this check.
> >>>
> >>> Thks
> >>> Amol
> >>>
> >>>
> >>> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare <
> san...@datatorrent.com>
> >>> wrote:
> >>>
> >>> A JIRA has been created for adding this thread affinity check
> >>>> https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
> >>>> enhancement in a branch
> >>>> https://github.com/sanjaypujare/apex-core/tree/malhar-510.
> >>>> thread_affinity
> >>>> and I have been benchmarking the performance with this change. I will
> be
> >>>> publishing the results in the above JIRA where we can discuss them and
> >>>> hopefully agree on merging this change.
> >>>>
> >>>> On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare <
> san...@datatorrent.com>
> >>>> wrote:
> >>>>
> >>>> You are right, I was subconsciously thinking about the THREAD_LOCAL
> case
> >>>>> with a single container and a simple DAG and in that case Vlad’s
> >>>>>
> >>>> assumption
> >>>>
> >>>>> might not be valid but may be it is.
> >>>>>
> >>>>> On 8/11/16, 11:47 AM, "Munagala Ramanath" <r...@datatorrent.com>
> wrote:
> >>>>>
> >>>>>      If I understand Vlad correctly, what he is saying is that each
> >>>>>
> >>>> operator
> >>>>
> >>>>>      saves currentThread in
> >>>>>      its own setup() and checks it in its own output methods. The
> >>>>> threads
> >>>>>
> >>>> in
> >>>>
> >>>>>      different operators are
> >>>>>      running potentially on different nodes and/or processes and
> there
> >>>>>
> >>>> will
> >>>>
> >>>>> be
> >>>>>      no connection between them.
> >>>>>
> >>>>>      Ram
> >>>>>
> >>>>>      On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
> >>>>> san...@datatorrent.com>
> >>>>>      wrote:
> >>>>>
> >>>>>      > Name check is expensive, agreed, but there isn’t anything else
> >>>>> currently.
> >>>>>      > Ideally the stram engine (considering that it is an engine
> >>>>>
> >>>> providing
> >>>>
> >>>>>      > resources like threads etc) should use a ThreadFactory or a
> >>>>> ThreadGroup to
> >>>>>      > create operator threads so identification and adding
> >>>>> functionality
> >>>>>
> >>>> is
> >>>>
> >>>>>      > easier.
> >>>>>      >
> >>>>>      > The idea of checking for the same thread between setup() and
> >>>>> emit()
> >>>>> won’t
> >>>>>      > work because the emit() check will have to be in the Sink
> >>>>> hierarchy
> >>>>> and
> >>>>>      > AFAIK a Sink object doesn’t have access to the corresponding
> >>>>> operator,
> >>>>>      > right? Another more fundamental problem probably is that these
> >>>>> threads
> >>>>>      > don’t have to match. The emit() for any operator (or rather a
> >>>>> Sink
> >>>>> related
> >>>>>      > to an operator) is ultimately triggered by an emitTuple() on
> the
> >>>>> topmost
> >>>>>      > input operator in that path which happens in that input
> >>>>> operator’s
> >>>>> thread
> >>>>>      > which doesn’t have to match the thread calling setup() in the
> >>>>> downstream
> >>>>>      > operators, right?
> >>>>>      >
> >>>>>      >
> >>>>>      > On 8/11/16, 10:59 AM, "Vlad Rozov" <v.ro...@datatorrent.com>
> >>>>>
> >>>> wrote:
> >>>>
> >>>>>      >
> >>>>>      >     Name verification is too expensive, it will be sufficient
> to
> >>>>> store
> >>>>>      >     currentThread during setup() and verify that it is the
> same
> >>>>> during
> >>>>>      > emit.
> >>>>>      >     Checks should be supported not only for
> DefaultOutputPort, so
> >>>>>
> >>>> we
> >>>>
> >>>>> may
> >>>>>      >     have it implemented in various Sinks.
> >>>>>      >
> >>>>>      >     Vlad
> >>>>>      >
> >>>>>      >     On 8/11/16 10:21, Sanjay Pujare wrote:
> >>>>>      >     > Thinking more about this – all of the “operator” threads
> >>>>> are
> >>>>> created
> >>>>>      > by the Stram engine with appropriate names. So we can put
> checks
> >>>>> in
> >>>>> the
> >>>>>      > DefaultOutputPort.emit() or in the various implementations of
> >>>>> Sink.put()
> >>>>>      > that the current-thread is one created by the Stram engine (by
> >>>>> verifying
> >>>>>      > the name).
> >>>>>      >     >
> >>>>>      >     > We can even use a special Thread object for operator
> >>>>> threads
> >>>>> so the
> >>>>>      > above detection is easier.
> >>>>>      >     >
> >>>>>      >     >
> >>>>>      >     >
> >>>>>      >     > On 8/10/16, 6:11 PM, "Amol Kekre" <a...@datatorrent.com
> >
> >>>>> wrote:
> >>>>>      >     >
> >>>>>      >     >      +1 on debug proposal. Even if tuples lands up
> within
> >>>>> the
> >>>>>      > window, it breaks
> >>>>>      >     >      all guarantees. A rerun (after restart from a
> >>>>>
> >>>> checkpoint)
> >>>>
> >>>>> can
> >>>>>      > have tuples
> >>>>>      >     >      in different windows from this thread. A separate
> >>>>> thread
> >>>>> simply
> >>>>>      > exposes
> >>>>>      >     >      users to unwarranted risks.
> >>>>>      >     >
> >>>>>      >     >      Thks
> >>>>>      >     >      Amol
> >>>>>      >     >
> >>>>>      >     >
> >>>>>      >     >      On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov <
> >>>>>      > v.ro...@datatorrent.com> wrote:
> >>>>>      >     >
> >>>>>      >     >      > Tuples emitted between end and begin windows is
> only
> >>>>> one of
> >>>>>      > possible
> >>>>>      >     >      > behaviors that emitting tuples on a separate from
> >>>>> the
> >>>>>      > operator thread may
> >>>>>      >     >      > introduce. It will be good to have both checks in
> >>>>>
> >>>> place
> >>>>
> >>>>> at
> >>>>>      > run-time and if
> >>>>>      >     >      > checking for the operator thread for every
> emitted
> >>>>> tuple is
> >>>>>      > too expensive,
> >>>>>      >     >      > we may have it enabled only in DEBUG or mode with
> >>>>> more
> >>>>> checks
> >>>>>      > in place.
> >>>>>      >     >      >
> >>>>>      >     >      > Vlad
> >>>>>      >     >      >
> >>>>>      >     >      >
> >>>>>      >     >      > Sanjay just reminded me of my typo -> I meant
> >>>>> between
> >>>>>      > end_window and
> >>>>>      >     >      >> start_window :)
> >>>>>      >     >      >>
> >>>>>      >     >      >> Thks
> >>>>>      >     >      >> Amol
> >>>>>      >     >      >>
> >>>>>      >     >      >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare <
> >>>>>      > san...@datatorrent.com>
> >>>>>      >     >      >> wrote:
> >>>>>      >     >      >>
> >>>>>      >     >      >> If the goal is to do this validation through
> static
> >>>>> analysis
> >>>>>      > of operator
> >>>>>      >     >      >>> code, I guess it is possible but is going to be
> >>>>>      > non-trivial. And there
> >>>>>      >     >      >>> could be false positives and false negatives.
> >>>>>      >     >      >>>
> >>>>>      >     >      >>> Also I suppose this discussion applies to
> >>>>> processor
> >>>>>      > operators (those
> >>>>>      >     >      >>> having both in and out ports) so Ram’s example
> of
> >>>>>      > JdbcPollInputOperator
> >>>>>      >     >      >>> may
> >>>>>      >     >      >>> not be applicable here?
> >>>>>      >     >      >>>
> >>>>>      >     >      >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" <
> >>>>>      > ashwinchand...@gmail.com>
> >>>>>      >     >      >>> wrote:
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>      In a separate thread I mean.
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>      Regards,
> >>>>>      >     >      >>>      Ashwin.
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>      On Wed, Aug 10, 2016 at 2:01 PM, Ashwin
> >>>>> Chandra
> >>>>> Putta <
> >>>>>      >     >      >>>      ashwinchand...@gmail.com> wrote:
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>      > + dev@apex.apache.org
> >>>>>      >     >      >>>      > - us...@apex.apache.org
> >>>>>      >     >      >>>      >
> >>>>>      >     >      >>>      > This is one of those best practices
> that we
> >>>>> learn by
> >>>>>      > experience
> >>>>>      >     >      >>> during
> >>>>>      >     >      >>>      > operator development. It will save a
> lot of
> >>>>> time
> >>>>>      > during operator
> >>>>>      >     >      >>>      > development if we can catch and throw
> >>>>> validation
> >>>>>      > error when
> >>>>>      >     >      >>> someone
> >>>>>      >     >      >>> emits
> >>>>>      >     >      >>>      > tuples in a non separate thread.
> >>>>>      >     >      >>>      >
> >>>>>      >     >      >>>      > Regards,
> >>>>>      >     >      >>>      > Ashwin
> >>>>>      >     >      >>>      >
> >>>>>      >     >      >>>      > On Wed, Aug 10, 2016 at 1:57 PM,
> Munagala
> >>>>> Ramanath <
> >>>>>      >     >      >>> r...@datatorrent.com>
> >>>>>      >     >      >>>      > wrote:
> >>>>>      >     >      >>>      >
> >>>>>      >     >      >>>      >> For cases where use of a different
> thread
> >>>>> is
> >>>>>      > needed, it can write
> >>>>>      >     >      >>> tuples
> >>>>>      >     >      >>>      >> to a queue from where the operator
> thread
> >>>>> pulls
> >>>>>      > them --
> >>>>>      >     >      >>>      >> JdbcPollInputOperator in Malhar has an
> >>>>> example.
> >>>>>      >     >      >>>      >>
> >>>>>      >     >      >>>      >> Ram
> >>>>>      >     >      >>>      >>
> >>>>>      >     >      >>>      >> On Wed, Aug 10, 2016 at 1:50 PM,
> >>>>> hsy...@gmail.com <
> >>>>>      >     >      >>> hsy...@gmail.com
> >>>>>      >     >      >>>      >> wrote:
> >>>>>      >     >      >>>      >>
> >>>>>      >     >      >>>      >>> Hey Vlad,
> >>>>>      >     >      >>>      >>>
> >>>>>      >     >      >>>      >>> Thanks for bringing this up. Is there
> an
> >>>>> easy way
> >>>>>      > to detect
> >>>>>      >     >      >>> unexpected
> >>>>>      >     >      >>>      >>> use of emit method without hurt the
> >>>>> performance.
> >>>>>      > Or at least if
> >>>>>      >     >      >>> we
> >>>>>      >     >      >>> can
> >>>>>      >     >      >>>      >>> detect this in debug mode.
> >>>>>      >     >      >>>      >>>
> >>>>>      >     >      >>>      >>> Regards,
> >>>>>      >     >      >>>      >>> Siyuan
> >>>>>      >     >      >>>      >>>
> >>>>>      >     >      >>>      >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad
> >>>>>
> >>>> Rozov
> >>>>
> >>>>> <
> >>>>>      >     >      >>> v.ro...@datatorrent.com>
> >>>>>      >     >      >>>      >>> wrote:
> >>>>>      >     >      >>>      >>>
> >>>>>      >     >      >>>      >>>> The short answer is no, creating
> worker
> >>>>> thread to
> >>>>>      > emit tuples
> >>>>>      >     >      >>> is
> >>>>>      >     >      >>> not
> >>>>>      >     >      >>>      >>>> supported by Apex and will lead to an
> >>>>> undefined
> >>>>>      > behavior.
> >>>>>      >     >      >>> Operators in Apex
> >>>>>      >     >      >>>      >>>> have strong thread affinity and all
> >>>>> interaction
> >>>>>      > with the
> >>>>>      >     >      >>> platform
> >>>>>      >     >      >>> must
> >>>>>      >     >      >>>      >>>> happen on the operator thread.
> >>>>>      >     >      >>>      >>>>
> >>>>>      >     >      >>>      >>>> Vlad
> >>>>>      >     >      >>>      >>>>
> >>>>>      >     >      >>>      >>>
> >>>>>      >     >      >>>      >>>
> >>>>>      >     >      >>>      >>
> >>>>>      >     >      >>>      >
> >>>>>      >     >      >>>      >
> >>>>>      >     >      >>>      > --
> >>>>>      >     >      >>>      >
> >>>>>      >     >      >>>      > Regards,
> >>>>>      >     >      >>>      > Ashwin.
> >>>>>      >     >      >>>      >
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>      --
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>      Regards,
> >>>>>      >     >      >>>      Ashwin.
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>
> >>>>>      >     >      >>>
> >>>>>      >     >      >
> >>>>>      >     >
> >>>>>      >     >
> >>>>>      >     >
> >>>>>      >
> >>>>>      >
> >>>>>      >
> >>>>>      >
> >>>>>      >
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>
>

Reply via email to