[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607276#comment-15607276
 ] 

Jonathan Eagles commented on TEZ-3271:
--------------------------------------

bq. the above should have parenthesis to make the code more understandable at a 
first glance.
added parentheses to the boolean logic
bq. would be useful to log count of failed tasks, total tasks, threshold to aid 
debugging.
modified logging to include the failed tasks total tasks, and threshold to be 
in line with the diagnostic message
bq. Do the task completion events to history need to be changed to publish this 
info too? Can be done in a follow-up jira but should be done and made visible 
via the UI
Will make sure to handle this in the follow-up jira
bq. why pick the first attempt instead of the last one?
Switched logic to pick the last one. Basically picked the first one so as to 
not iterate the whole list as needed by the API provided
bq. any particular reason for the generic exception as compared to a specific 
one being thrown?
This function throws TezException and IOException. Let me what the right thing 
to do in this particular situation.
bq. No test changes for TestVertexImpl? testVertexFailuresMaxPercent() does a 
good high level end to end verification but we probably need some unit tests at 
the VertexImpl level to test thresholds, event generation, etc.
Added both positive and negative tests to TestVertexImpl.


> Provide mapreduce failures.maxpercent equivalent
> ------------------------------------------------
>
>                 Key: TEZ-3271
>                 URL: https://issues.apache.org/jira/browse/TEZ-3271
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to