steveloughran opened a new pull request #1963: HADOOP-16798. S3A Committer 
thread pool shutdown problems.
URL: https://github.com/apache/hadoop/pull/1963
 
 
   
   Contributed by Steve Loughran.
   
   Fixes a condition which can cause job commit to fail if a task was
   aborted < 60s before the job commit commenced: the task abort
   will shut down the thread pool with a hard exit after 60s; the
   job commit POST requests would be scheduled through the same pool,
   so be interrupted and fail. At present the access is synchronized,
   but presumably the executor shutdown code is calling wait() and releasing
   locks.
   
   Task abort is triggered from the AM when task attempts succeed but
   there are still active speculative task attempts running. Thus it
   only surfaces when speculation is enabled and the final tasks are
   speculating, which, given they are the stragglers, is not
   unheard of.
   
   The fix copies and clears the threadPool field in a synchronized block,
   then shuts it down; job commit will encounter the empty field and
   demand-create a new one. As would a sequence of task aborts.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to