[ 
https://issues.apache.org/jira/browse/HADOOP-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607533#action_12607533
 ] 

Steve Loughran commented on HADOOP-3618:
----------------------------------------

the wait loop spins and sleeps:

     
+    // check if the jobtracker is ready
+    while (true) {
+      if (jobSubmitClient.isReady()) {
+        break;
+      }
+      try {
+        Thread.sleep(JOBTRACKER_POLL_INTERVAL);
+      } catch (InterruptedException ie){}
+    }

1. If the thread is interrupted, it implies somebody wanted to stop it. why not 
listen to that request by ending the thread, rather than spinning indefinately. 
This loop will make a job client thread impossible to kill in-process until the 
tracker is live.

2. in other projects, we've found problems if a few hundred machines have just 
come up fully synchronised, as they can do when a site's power gets toggled. 
They all poll simultaneously, flood the network and then wait..even with 
exponential back-off they are all in sync. So: a bit of random jitter on the 
sleep is good; likewise, the poll interval may be a configuration point.

If this sleep-until-ready pattern is common, it should be factored out into a 
method of its own and shared across things. I've been stubbing out (for my 
deployment use) a simple lifecycle interface (start/stop/getstatus/ping)...if 
that were adopted then we this patch could poll the getStatus() method.

> JobClient should keep on retrying if the jobtracker is still initializing
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3618
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3618
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3618.patch
>
>
> When the user submits the job while the jobtracker is still initializing, the 
> jobclient comes out with an exception. ideally the jobclient should keep on 
> retrying until the jobtracker is up and ready. This will also take care of 
> HADOOP-3289. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to