[ 
https://issues.apache.org/jira/browse/EDGENT-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893024#comment-15893024
 ] 

Dale LaBossiere commented on EDGENT-382:
----------------------------------------

So at a high level the intent was that {{JobMonitorApp}} was to be the 
resiliency answer?

Some questions come to mind:
- What was the thinking wrt uniformity of behavior among the providers? Is its 
absence from DirectProvider just an omission?
- What was the thinking wrt uniformity of behavior within an app wrt such 
exceptions?  i.e., if any "application supplied function" throws, should the 
job get restarted?  (assuming the Isolate and Barrier items noted above were 
addressed)
- What was the thinking wrt additional alternative behaviors such as a less 
heavy handed "just log and continue"?


> A RuntimeException thrown while processing a tuple brings down the whole 
> topology
> ---------------------------------------------------------------------------------
>
>                 Key: EDGENT-382
>                 URL: https://issues.apache.org/jira/browse/EDGENT-382
>             Project: Edgent
>          Issue Type: Bug
>          Components: Runtime
>            Reporter: Dale LaBossiere
>         Attachments: DlabossExceptionTest.java
>
>
> I encountered the above in the context of the WIoTP connector, and
> there may be a problem there as well, but it’s trivial to demonstrate the
> problem in a more general context.
> i.e., a RuntimeException thrown from a Topology.poll(), generate(), source() 
> or from an unisolated user function implementation downstream of the source, 
> like a map() or sink()'s function, causes the topology to immediately 
> terminate.  That typically causes the process to terminate.
> It's unclear to me which parts of the runtime should be doing what with 
> respect to this.
> Things need to be more resilient in the face of transient errors, 
> particularly wrt transient connector problems.  As an example 
> MqttPublisher.accept() achieved resiliency in the face of transient 
> connection problems by logging instead of throwing.  IotpDevice connector 
> just throws... which at a certain level is OK/desirable... if the runtime 
> were to handle resiliency issues.
> Note, a RuntimeException from a Topology.events() supplier or even a 
> downstream function doesn't result in topology termination.  That's because 
> the runtime thread blocking awaiting the next supplied tuple doesn't see the 
> RuntimeException.  And for the downstream case, the stream is Isolated so 
> again the runtime thread doesn't see the exception.  That said, the thread 
> internal to Isolate silently terminates in the face of a downstream 
> exception.  ugh.  (Barrier looks to have a similar problem).
> There needs to be some clear / prominent doc on all of this, what the design 
> / behavior is supposed to be, and then we can address any issues in the light 
> of that understanding.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to