[
https://issues.apache.org/jira/browse/TEZ-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179718#comment-14179718
]
Siddharth Seth commented on TEZ-1267:
-------------------------------------
bq. The issue I mention is that we have inline call of ROUTE_EVENT_TRANSITION
when handling the events in pendingList, and we assume it would always succeed,
just ignore the returned state. So I separate the real handling TezEvent work
into another method handleRoutedTezEvents(), and call this method when we'd
like to handle the events in the pendingList.
Nice catch! Would've ended up skipping a bunch of exceptions without some
changes to this code.
On the way the code is changed to handle this right now, I think it's a little
safer to just move the entire RouteEventTransition.handle(...) function into a
separate method, instead of splitting it up into one method which Routes the
events upstream, and another one which handles events meant for this vertex. A
couple of reasons for this - 1) there's multiple ways to split the RouteEvent
logic - along these lines (meant for self vs meant for other) vs All logic for
a event type goes together (separate by event type), and 2) It's a big change
which copies some code from the method into the other method - which most
likely is correct, but I'd prefer if this were made in 0.6.0, instead of the
0.5 line (especially not in 0.5.2), 3) Even after this split, there's one
invocation of ROUTE_EVENT_TRANSITION.transition outside of the state machine.
The split of the method can be a separate jira.
For this, just moving the handle logic into a separate method and invoking it
should be sufficient. The RouteTransition would end up changing to just invoke
this method, handle exceptions and report the appropriate state.
One correction from my previous comment. ROUTE_EVENT should be handled in
INITIALIZING and INITED state - since the VMs and InputInitializers are setup
by this point, and event arrival doesn't depend on the state of the vertex -
and need to be forwarded to the VM/II early. Sorry about that.
The VertexManagerException is a useful change - since it is translated within
the VertexManager itself.
- There's some inconsistent usage of this though. There's some places where
VertexManagerException is caught, in others a generic Exception is caught
(especially the invocation of ROUTE_EVENT_TRANSITION.transition). Catching an
Exception and reporting it as a VertexManager error in diagnostics can be
incorrect since the exception could have originated from Edge routing.
- Also, do you plan on introducing an EdgeManagerException + IIException in
TEZ-1689 ? It may be better to just use a single type of exception, with an
enumeration defining the source.
- This needs to be annotated as private, and some limited javadoc on the intent
of the exception would be useful.
- The final exception reported back to the user should be
VertexManagerException.getCause() - users don't need to be aware of this
wrapping.
- Is the entire stack trace too much information for the diagnostic message, or
is that standard practice for other cases as well ?
In the new tests, is it possible to verify the Termination cause ? That's a
good set of tests btw.
Minor stuff in the code
- Remove commented out line in TestExceptionPropagation
- Typo - "VertexManageEvent"
- s/Exception happen(s)/Exception in/
> Exception handling when Routing Events
> --------------------------------------
>
> Key: TEZ-1267
> URL: https://issues.apache.org/jira/browse/TEZ-1267
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: TEZ-1267-2.patch, TEZ-1267-3.patch, Tez-1267.patch
>
>
> Events are generated by user code. In some places they're also handled by
> user code within the AM. Currently, exceptions which are generated when
> handling user code will end up killing the AM (and hence leading to a retry).
> Instead, failure to handle such events, should cause the application to fail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)