[
https://issues.apache.org/jira/browse/TEZ-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121147#comment-14121147
]
Siddharth Seth commented on TEZ-1539:
-------------------------------------
IMO, a model where events are sent only on success is very useful, but at the
same time - there can be a requirement to send events while a task is running.
All current use cases fall into the first category where events are sent on
completion.
Supporting both can be done in several ways.
1) Modifying the sendEvent API to include a FIRE_ONCE_ON_SUCCESS option. This
can get really messy and confusing if similar events are sent with and without
the flag, and requires the addition of a new sendEvents API. The current
'return event list' from method calls isn't usable either.
2) Certain kinds of events (the current VMEvents and InputInitializerEvents)
will always follow this approach. Additional event types can be used when
events need to be sent during execution. Such events would need versioning and
enough information for the user code to de-dupe them.
The second approach is what I'm in favor of. For now, VMEvents and
InputInitializerEvents (which is all events that go to the AM) would follow the
fire_once semantics.
> Allow a FIRE_ONCE_ON_SUCCESS model for events generated by user code
> --------------------------------------------------------------------
>
> Key: TEZ-1539
> URL: https://issues.apache.org/jira/browse/TEZ-1539
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
>
> Specifically for InputInitalizerEvents and VertexManagerEvents.
> Pasting comment from TEZ-1447
> In a majority of cases, events generated by different attempts of the same
> task will be identical - in which case just making use of the event generated
> by the first successful attempt is adequate. Doing something like this manes
> that users don't worry about retries, indices etc - and can just rely on
> receiving a set of events which are to be processed once the vertex succeeds.
> If different attempts of the same workload generate different events -
> processing is likely to be incorrect, since it's very possible for all data
> to be processed (VERTEX successful), then a failure and retry - which
> generates a different event. The initializer doesn't even run at this point,
> since it's already done it's work and is complete. Handling such scenarios,
> likely involves re-running the entire initializer and re-starting the vertex
> which processed the event from scratch. In situations like this, where data
> generated may be different, the best bet is for speculation to be disabled
> (when it's supported), and max-attempts to be set to 1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)