unhealthy

Robert Joseph Evans (Commented) (JIRA) Fri, 06 Apr 2012 08:13:47 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248411#comment-13248411
 ]


Robert Joseph Evans commented on MAPREDUCE-3921:
------------------------------------------------

A few minor comments about the patch, and some questions on the manual testing 
that was done on it.  Overall the patch looks very good and once the javac 
warnings are addressed and I know manual testing was performed I am a +1 on it

  # Have you tested this with AM Recovery?  Specifically I would like to see 
the AM recover when a map task finished successfully and then was killed 
because the node went bad.
  # Have you tested this with reduces?  The code will reschedule the map task, 
but I don't really see where/if it informs the reducer that it is rescheduling 
the map task until that new task finishes successfully.  I believe that the 
reducer would just ignore an update for a task it has already fetched 
successfully, but I just want to be sure it was tested.
  # NodeState.isUnhealthy()  (Very minor) I think it would be cleaner, to have 
it be {code}
return this == UNHEALTHY ||
       this == DECOMMISSIONED ||
       this == LOST;
{code}
  # KilledAfterSuccessTransition.transition() There is some commented out code 
{code}
// why set a wrong finish time ???
//set the finish time
//taskAttempt.setFinishTime();
{code} Is this needed? If not please remove it.
  # KilledAfterSuccessTransition.transition() I am a bit confused by the log 
statement {code}
if (taskAttempt.getLaunchTime() != 0) {
  ...
}else {
  LOG.debug("Not generating HistoryFinish event since start event not generated 
for taskAttempt: "
      + taskAttempt.getID());
}
{code} Is this really needed (looks like it was a copy and paste from the 
KilledTransition)?  When would we even get a successful job that did not have a 
launch time?  I would rather have it be an ERROR or WARN rather then a debug if 
we did see this in this transition.
  # TaskAttemptCompletedEventTransition.transition() {code}
// TODO assert nodeId is not null
{code} please either add in the assert or remove the TODO.

                
> MR AM should act on the nodes liveliness information when nodes go 
> up/down/unhealthy
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3921
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3921
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Bikas Saha
>             Fix For: 0.23.2
>
>         Attachments: MAPREDUCE-3921-branch-0.23.patch, 
> MAPREDUCE-3921-branch-0.23.patch, MAPREDUCE-3921-branch-0.23.patch, 
> MAPREDUCE-3921.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3921) MR AM should act on the nodes liveliness information when nodes go up/down/unhealthy

Reply via email to