[
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607276#comment-15607276
]
Jonathan Eagles commented on TEZ-3271:
--------------------------------------
bq. the above should have parenthesis to make the code more understandable at a
first glance.
added parentheses to the boolean logic
bq. would be useful to log count of failed tasks, total tasks, threshold to aid
debugging.
modified logging to include the failed tasks total tasks, and threshold to be
in line with the diagnostic message
bq. Do the task completion events to history need to be changed to publish this
info too? Can be done in a follow-up jira but should be done and made visible
via the UI
Will make sure to handle this in the follow-up jira
bq. why pick the first attempt instead of the last one?
Switched logic to pick the last one. Basically picked the first one so as to
not iterate the whole list as needed by the API provided
bq. any particular reason for the generic exception as compared to a specific
one being thrown?
This function throws TezException and IOException. Let me what the right thing
to do in this particular situation.
bq. No test changes for TestVertexImpl? testVertexFailuresMaxPercent() does a
good high level end to end verification but we probably need some unit tests at
the VertexImpl level to test thresholds, event generation, etc.
Added both positive and negative tests to TestVertexImpl.
> Provide mapreduce failures.maxpercent equivalent
> ------------------------------------------------
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch,
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch,
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed
> to cause the work to be considered a success. To meet that end, I propose we
> provide a tez equivalent of mapreduce.map.failures.maxpercent and
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered
> a success if the number of failures is below a configured threshold.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)