Correct, except that it is Sink not an Operator that will need to save current thread during setup(). Sink does not need access to an Operator, it is sufficient to rely on the platform to call setup() method on the Operator thread.

Vlad


On 8/11/16 11:47, Munagala Ramanath wrote:
If I understand Vlad correctly, what he is saying is that each operator
saves currentThread in
its own setup() and checks it in its own output methods. The threads in
different operators are
running potentially on different nodes and/or processes and there will be
no connection between them.

Ram

On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <san...@datatorrent.com>
wrote:

Name check is expensive, agreed, but there isn’t anything else currently.
Ideally the stram engine (considering that it is an engine providing
resources like threads etc) should use a ThreadFactory or a ThreadGroup to
create operator threads so identification and adding functionality is
easier.

The idea of checking for the same thread between setup() and emit() won’t
work because the emit() check will have to be in the Sink hierarchy and
AFAIK a Sink object doesn’t have access to the corresponding operator,
right? Another more fundamental problem probably is that these threads
don’t have to match. The emit() for any operator (or rather a Sink related
to an operator) is ultimately triggered by an emitTuple() on the topmost
input operator in that path which happens in that input operator’s thread
which doesn’t have to match the thread calling setup() in the downstream
operators, right?


On 8/11/16, 10:59 AM, "Vlad Rozov" <v.ro...@datatorrent.com> wrote:

     Name verification is too expensive, it will be sufficient to store
     currentThread during setup() and verify that it is the same during
emit.
     Checks should be supported not only for DefaultOutputPort, so we may
     have it implemented in various Sinks.

     Vlad

     On 8/11/16 10:21, Sanjay Pujare wrote:
     > Thinking more about this – all of the “operator” threads are created
by the Stram engine with appropriate names. So we can put checks in the
DefaultOutputPort.emit() or in the various implementations of Sink.put()
that the current-thread is one created by the Stram engine (by verifying
the name).
     >
     > We can even use a special Thread object for operator threads so the
above detection is easier.
     >
     >
     >
     > On 8/10/16, 6:11 PM, "Amol Kekre" <a...@datatorrent.com> wrote:
     >
     >      +1 on debug proposal. Even if tuples lands up within the
window, it breaks
     >      all guarantees. A rerun (after restart from a checkpoint) can
have tuples
     >      in different windows from this thread. A separate thread simply
exposes
     >      users to unwarranted risks.
     >
     >      Thks
     >      Amol
     >
     >
     >      On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov <
v.ro...@datatorrent.com> wrote:
     >
     >      > Tuples emitted between end and begin windows is only one of
possible
     >      > behaviors that emitting tuples on a separate from the
operator thread may
     >      > introduce. It will be good to have both checks in place at
run-time and if
     >      > checking for the operator thread for every emitted tuple is
too expensive,
     >      > we may have it enabled only in DEBUG or mode with more checks
in place.
     >      >
     >      > Vlad
     >      >
     >      >
     >      > Sanjay just reminded me of my typo -> I meant between
end_window and
     >      >> start_window :)
     >      >>
     >      >> Thks
     >      >> Amol
     >      >>
     >      >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare <
san...@datatorrent.com>
     >      >> wrote:
     >      >>
     >      >> If the goal is to do this validation through static analysis
of operator
     >      >>> code, I guess it is possible but is going to be
non-trivial. And there
     >      >>> could be false positives and false negatives.
     >      >>>
     >      >>> Also I suppose this discussion applies to processor
operators (those
     >      >>> having both in and out ports) so Ram’s example of
JdbcPollInputOperator
     >      >>> may
     >      >>> not be applicable here?
     >      >>>
     >      >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" <
ashwinchand...@gmail.com>
     >      >>> wrote:
     >      >>>
     >      >>>      In a separate thread I mean.
     >      >>>
     >      >>>      Regards,
     >      >>>      Ashwin.
     >      >>>
     >      >>>      On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
     >      >>>      ashwinchand...@gmail.com> wrote:
     >      >>>
     >      >>>      > + dev@apex.apache.org
     >      >>>      > - us...@apex.apache.org
     >      >>>      >
     >      >>>      > This is one of those best practices that we learn by
experience
     >      >>> during
     >      >>>      > operator development. It will save a lot of time
during operator
     >      >>>      > development if we can catch and throw validation
error when
     >      >>> someone
     >      >>> emits
     >      >>>      > tuples in a non separate thread.
     >      >>>      >
     >      >>>      > Regards,
     >      >>>      > Ashwin
     >      >>>      >
     >      >>>      > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
     >      >>> r...@datatorrent.com>
     >      >>>      > wrote:
     >      >>>      >
     >      >>>      >> For cases where use of a different thread is
needed, it can write
     >      >>> tuples
     >      >>>      >> to a queue from where the operator thread pulls
them --
     >      >>>      >> JdbcPollInputOperator in Malhar has an example.
     >      >>>      >>
     >      >>>      >> Ram
     >      >>>      >>
     >      >>>      >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com <
     >      >>> hsy...@gmail.com
     >      >>>      >> wrote:
     >      >>>      >>
     >      >>>      >>> Hey Vlad,
     >      >>>      >>>
     >      >>>      >>> Thanks for bringing this up. Is there an easy way
to detect
     >      >>> unexpected
     >      >>>      >>> use of emit method without hurt the performance.
Or at least if
     >      >>> we
     >      >>> can
     >      >>>      >>> detect this in debug mode.
     >      >>>      >>>
     >      >>>      >>> Regards,
     >      >>>      >>> Siyuan
     >      >>>      >>>
     >      >>>      >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <
     >      >>> v.ro...@datatorrent.com>
     >      >>>      >>> wrote:
     >      >>>      >>>
     >      >>>      >>>> The short answer is no, creating worker thread to
emit tuples
     >      >>> is
     >      >>> not
     >      >>>      >>>> supported by Apex and will lead to an undefined
behavior.
     >      >>> Operators in Apex
     >      >>>      >>>> have strong thread affinity and all interaction
with the
     >      >>> platform
     >      >>> must
     >      >>>      >>>> happen on the operator thread.
     >      >>>      >>>>
     >      >>>      >>>> Vlad
     >      >>>      >>>>
     >      >>>      >>>
     >      >>>      >>>
     >      >>>      >>
     >      >>>      >
     >      >>>      >
     >      >>>      > --
     >      >>>      >
     >      >>>      > Regards,
     >      >>>      > Ashwin.
     >      >>>      >
     >      >>>
     >      >>>
     >      >>>
     >      >>>      --
     >      >>>
     >      >>>      Regards,
     >      >>>      Ashwin.
     >      >>>
     >      >>>
     >      >>>
     >      >>>
     >      >>>
     >      >
     >
     >
     >






Reply via email to