[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606447#comment-15606447
 ] 

Hitesh Shah commented on TEZ-3271:
----------------------------------

Comments: 

{code}
boolean vertexFailuresBelowThreshold = vertex.succeededTaskCount + 
vertex.failedTaskCount == vertex.numTasks && vertex.failedTaskCount * 100 <= 
vertex.maxFailuresPercent * vertex.numTasks;
{code}
   - the above should have parenthesis to make the code more understandable at 
a first glance.
   
{code}
          LOG.info("All tasks have completed and the number of failed tasks is 
within threshold, vertex:" + vertex.logIdentifier);
{code}
  - would be useful to log count of failed tasks, total tasks, threshold to aid 
debugging. 
  - Do the task completion events to history need to be changed to publish this 
info too? Can be done in a follow-up jira but should be done and made visible 
via the UI. 

{code}
                    TezTaskAttemptID attempt = 
task.getAttempts().keySet().iterator().next();
2147                LOG.info("Succeeding failed task attempt:" + attempt);
{code}
   - why pick the first attempt instead of the last one? 

{code}
generateEmptyEventsForAttempt(TezTaskAttemptID attempt) throws Exception
{code}
  - any particular reason for the generic exception as compared to a specific 
one being thrown? 

TEZ_VERTEX_FAILURES_MAXPERCENT needs a minor doc improvement to clarify whether 
the values are meant to be 0.0-1.0f or 0.0-100.0f. 

No test changes for TestVertexImpl? testVertexFailuresMaxPercent() does a good 
high level end to end verification but we probably need some unit tests at the 
VertexImpl level to test thresholds, event generation, etc. 





> Provide mapreduce failures.maxpercent equivalent
> ------------------------------------------------
>
>                 Key: TEZ-3271
>                 URL: https://issues.apache.org/jira/browse/TEZ-3271
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to