[jira] [Comment Edited] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

Jonathan Eagles (JIRA) Tue, 25 Oct 2016 07:34:23 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603334#comment-15603334
 ]


Jonathan Eagles edited comment on TEZ-3271 at 10/25/16 2:32 PM:
----------------------------------------------------------------

bq. generateEmptyEventsForSourceTask in EdgeManagerPlugin should not be an 
abstract function. Given that CartesianProductEdgeManager needs changing this 
is an incompatible feature. An appropriate exception thrown could be used to 
indicate that the EM plugin in use does not support the failure threshold 
percent feature.
If we strictly limit this feature to known tez outputs, we can avoid empty 
event generation at this time in the edge manager plugin and can promote that 
to the edge.

bq. I think we can add a fail-safe in the edge plugins to generate the events 
only for known outputs (maybe if they belong the tez runtime package ? )
I can add exception throwing to the Edge to restrict this to org.apache.tez 
outputs only

bq. i.e. if someone ends up writing a new output that uses a different payload 
we would need to throw an error at least with the current impl though we do 
need to figure out how the EM plugin can invoke an empty event that the Input 
understands. One option here would be to enhance the DME meta info to indicate 
empty/null payload or invoke an api on the Output to generate the empty data 
event.
I think this is aimed at how to implement this completely generically and 
should go into a follow up JIRA if we are using this jira to implement a 
stop-gap until a full blown implementation can be finished.

bq. As for event generation, I have a doubt with respect to recovery given that 
we expect all DME events to be generated before a task completes. This might be 
something to test more carefully on recovery to see if events are generated 
correctly as needed when a failed vertex is recovered or replayed as needed.
Will see about this.

bq. Unit test could be moved to TestTezJobs. At some point we probably need to 
get rid of a lot of the TestMRR* minicluster tests.
I am assuming you mean to reimplement in a non-MR way and not to just move the 
code over and so will approach this comment from that perspective.


was (Author: jeagles):
bq. generateEmptyEventsForSourceTask in EdgeManagerPlugin should not be an 
abstract function. Given that CartesianProductEdgeManager needs changing this 
is an incompatible feature. An appropriate exception thrown could be used to 
indicate that the EM plugin in use does not support the failure threshold 
percent feature.
If we strictly limit this feature to know tez outputs, we can avoid empty event 
generation at this time in the edge manager plugin and can promote that to the 
edge.

bq. I think we can add a fail-safe in the edge plugins to generate the events 
only for known outputs (maybe if they belong the tez runtime package ? )
I add exception throwing to the Edge to restrict this to org.apache.tez outputs 
only

bq. i.e. if someone ends up writing a new output that uses a different payload 
we would need to throw an error at least with the current impl though we do 
need to figure out how the EM plugin can invoke an empty event that the Input 
understands. One option here would be to enhance the DME meta info to indicate 
empty/null payload or invoke an api on the Output to generate the empty data 
event.
I think this is aimed at how to implement this completely generically and 
should go into a follow up JIRA if we are using this jira to implement a 
stop-gap until a full blow implementation can be finished.

bq. As for event generation, I have a doubt with respect to recovery given that 
we expect all DME events to be generated before a task completes. This might be 
something to test more carefully on recovery to see if events are generated 
correctly as needed when a failed vertex is recovered or replayed as needed.
Will see about this.

bq. Unit test could be moved to TestTezJobs. At some point we probably need to 
get rid of a lot of the TestMRR* minicluster tests.
I am assuming you mean to reimplement in a non-mr way and not to just move the 
code over and so will approach this comment from that perspective.

> Provide mapreduce failures.maxpercent equivalent
> ------------------------------------------------
>
>                 Key: TEZ-3271
>                 URL: https://issues.apache.org/jira/browse/TEZ-3271
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

Reply via email to