Name check is expensive, agreed, but there isn’t anything else currently. 
Ideally the stram engine (considering that it is an engine providing resources 
like threads etc) should use a ThreadFactory or a ThreadGroup to create 
operator threads so identification and adding functionality is easier.

The idea of checking for the same thread between setup() and emit() won’t work 
because the emit() check will have to be in the Sink hierarchy and AFAIK a Sink 
object doesn’t have access to the corresponding operator, right? Another more 
fundamental problem probably is that these threads don’t have to match. The 
emit() for any operator (or rather a Sink related to an operator) is ultimately 
triggered by an emitTuple() on the topmost input operator in that path which 
happens in that input operator’s thread which doesn’t have to match the thread 
calling setup() in the downstream operators, right?


On 8/11/16, 10:59 AM, "Vlad Rozov" <v.ro...@datatorrent.com> wrote:

    Name verification is too expensive, it will be sufficient to store 
    currentThread during setup() and verify that it is the same during emit. 
    Checks should be supported not only for DefaultOutputPort, so we may 
    have it implemented in various Sinks.
    
    Vlad
    
    On 8/11/16 10:21, Sanjay Pujare wrote:
    > Thinking more about this – all of the “operator” threads are created by 
the Stram engine with appropriate names. So we can put checks in the 
DefaultOutputPort.emit() or in the various implementations of Sink.put() that 
the current-thread is one created by the Stram engine (by verifying the name).
    >
    > We can even use a special Thread object for operator threads so the above 
detection is easier.
    >
    >
    >
    > On 8/10/16, 6:11 PM, "Amol Kekre" <a...@datatorrent.com> wrote:
    >
    >      +1 on debug proposal. Even if tuples lands up within the window, it 
breaks
    >      all guarantees. A rerun (after restart from a checkpoint) can have 
tuples
    >      in different windows from this thread. A separate thread simply 
exposes
    >      users to unwarranted risks.
    >      
    >      Thks
    >      Amol
    >      
    >      
    >      On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov 
<v.ro...@datatorrent.com> wrote:
    >      
    >      > Tuples emitted between end and begin windows is only one of 
possible
    >      > behaviors that emitting tuples on a separate from the operator 
thread may
    >      > introduce. It will be good to have both checks in place at 
run-time and if
    >      > checking for the operator thread for every emitted tuple is too 
expensive,
    >      > we may have it enabled only in DEBUG or mode with more checks in 
place.
    >      >
    >      > Vlad
    >      >
    >      >
    >      > Sanjay just reminded me of my typo -> I meant between end_window 
and
    >      >> start_window :)
    >      >>
    >      >> Thks
    >      >> Amol
    >      >>
    >      >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare 
<san...@datatorrent.com>
    >      >> wrote:
    >      >>
    >      >> If the goal is to do this validation through static analysis of 
operator
    >      >>> code, I guess it is possible but is going to be non-trivial. And 
there
    >      >>> could be false positives and false negatives.
    >      >>>
    >      >>> Also I suppose this discussion applies to processor operators 
(those
    >      >>> having both in and out ports) so Ram’s example of 
JdbcPollInputOperator
    >      >>> may
    >      >>> not be applicable here?
    >      >>>
    >      >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" 
<ashwinchand...@gmail.com>
    >      >>> wrote:
    >      >>>
    >      >>>      In a separate thread I mean.
    >      >>>
    >      >>>      Regards,
    >      >>>      Ashwin.
    >      >>>
    >      >>>      On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
    >      >>>      ashwinchand...@gmail.com> wrote:
    >      >>>
    >      >>>      > + dev@apex.apache.org
    >      >>>      > - us...@apex.apache.org
    >      >>>      >
    >      >>>      > This is one of those best practices that we learn by 
experience
    >      >>> during
    >      >>>      > operator development. It will save a lot of time during 
operator
    >      >>>      > development if we can catch and throw validation error 
when
    >      >>> someone
    >      >>> emits
    >      >>>      > tuples in a non separate thread.
    >      >>>      >
    >      >>>      > Regards,
    >      >>>      > Ashwin
    >      >>>      >
    >      >>>      > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
    >      >>> r...@datatorrent.com>
    >      >>>      > wrote:
    >      >>>      >
    >      >>>      >> For cases where use of a different thread is needed, it 
can write
    >      >>> tuples
    >      >>>      >> to a queue from where the operator thread pulls them --
    >      >>>      >> JdbcPollInputOperator in Malhar has an example.
    >      >>>      >>
    >      >>>      >> Ram
    >      >>>      >>
    >      >>>      >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com <
    >      >>> hsy...@gmail.com
    >      >>>      >> wrote:
    >      >>>      >>
    >      >>>      >>> Hey Vlad,
    >      >>>      >>>
    >      >>>      >>> Thanks for bringing this up. Is there an easy way to 
detect
    >      >>> unexpected
    >      >>>      >>> use of emit method without hurt the performance. Or at 
least if
    >      >>> we
    >      >>> can
    >      >>>      >>> detect this in debug mode.
    >      >>>      >>>
    >      >>>      >>> Regards,
    >      >>>      >>> Siyuan
    >      >>>      >>>
    >      >>>      >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <
    >      >>> v.ro...@datatorrent.com>
    >      >>>      >>> wrote:
    >      >>>      >>>
    >      >>>      >>>> The short answer is no, creating worker thread to emit 
tuples
    >      >>> is
    >      >>> not
    >      >>>      >>>> supported by Apex and will lead to an undefined 
behavior.
    >      >>> Operators in Apex
    >      >>>      >>>> have strong thread affinity and all interaction with 
the
    >      >>> platform
    >      >>> must
    >      >>>      >>>> happen on the operator thread.
    >      >>>      >>>>
    >      >>>      >>>> Vlad
    >      >>>      >>>>
    >      >>>      >>>
    >      >>>      >>>
    >      >>>      >>
    >      >>>      >
    >      >>>      >
    >      >>>      > --
    >      >>>      >
    >      >>>      > Regards,
    >      >>>      > Ashwin.
    >      >>>      >
    >      >>>
    >      >>>
    >      >>>
    >      >>>      --
    >      >>>
    >      >>>      Regards,
    >      >>>      Ashwin.
    >      >>>
    >      >>>
    >      >>>
    >      >>>
    >      >>>
    >      >
    >      
    >
    >
    
    


Reply via email to