Tina Shan created MAPREDUCE-7328:
------------------------------------

             Summary: Job.monitorAndPrintJob function can sleep most for 596 
hours when jobclient.progress.monitor.poll.interval is misconfigured , causing 
the job to hang  
                 Key: MAPREDUCE-7328
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7328
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: client
    Affects Versions: 3.3.0
            Reporter: Tina Shan


The loop terminates depending on a configurable value and there is little 
sanity checking on this value. When jobclient.progress.monitor.poll.interval is 
misconfigured to INT_MAX, it can cause the loop to sleep at most for 596 hours. 
The thread would get stuck and never report progress to the user even if the 
job moves forward. We suggest adding a cap value or a warning message.
  
{code:java}
 public boolean monitorAndPrintJob() 
      throws IOException, InterruptedException {
    ...
    while (!isComplete() || !reportedAfterCompletion) {
      if (isComplete()) {
        reportedAfterCompletion = true;
      } else {
        Thread.sleep(progMonitorPollIntervalMillis);
      }
    ...
}
 {code}
Similar bug to MAPREDUCE-7327



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to