[
https://issues.apache.org/jira/browse/HADOOP-15679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584515#comment-16584515
]
Steve Loughran commented on HADOOP-15679:
-----------------------------------------
[~xyao]: I'm adding a log at the end of the run; not doing the per-hook details
as it would be more complicated. With log4j set to debug and print threads, the
log of the test run is
{code}
2018-08-17 16:54:44,969 [main] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:shutdownHookManager(121)) - invoking
executeShutdown()
2018-08-17 16:54:44,978 [shutdown-hook-0] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:run(257)) - Starting shutdown of hook4 with sleep
time of 25000
2018-08-17 16:54:46,980 [main] WARN util.ShutdownHookManager
(ShutdownHookManager.java:executeShutdown(128)) - ShutdownHook 'Hook' timeout,
java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at
org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
at
org.apache.hadoop.util.TestShutdownHookManager.shutdownHookManager(TestShutdownHookManager.java:122)
...
2018-08-17 16:54:46,980 [shutdown-hook-0] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:run(268)) - Shutdown hook4 interrupted exception
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.util.TestShutdownHookManager$Hook.run(TestShutdownHookManager.java:260)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-08-17 16:54:46,984 [shutdown-hook-0] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:run(257)) - Starting shutdown of hook3 with sleep
time of 1000
2018-08-17 16:54:47,985 [shutdown-hook-0] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:run(262)) - Completed shutdown of hook3
2018-08-17 16:54:47,986 [shutdown-hook-0] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:run(257)) - Starting shutdown of hook2 with sleep
time of 0
2018-08-17 16:54:47,986 [shutdown-hook-0] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:run(262)) - Completed shutdown of hook2
2018-08-17 16:54:47,987 [shutdown-hook-0] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:run(257)) - Starting shutdown of hook1 with sleep
time of 0
2018-08-17 16:54:47,987 [shutdown-hook-0] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:run(262)) - Completed shutdown of hook1
2018-08-17 16:54:47,987 [main] INFO util.TestShutdownHookManager
(TestShutdownHookManager.java:shutdownHookManager(123)) - Shutdown completed
// and here, in the real shutdown hook of the process.
2018-08-17 16:54:47,994 [Thread-0] DEBUG util.ShutdownHookManager
(ShutdownHookManager.java:run(97)) - Completed shutdown in 0.000 seconds;
Timeouts: 0
2018-08-17 16:54:47,997 [Thread-0] DEBUG util.ShutdownHookManager
(ShutdownHookManager.java:shutdownExecutor(154)) - ShutdownHookManger completed
shutdown.
{code}
> ShutdownHookManager shutdown time needs to be configurable & extended
> ---------------------------------------------------------------------
>
> Key: HADOOP-15679
> URL: https://issues.apache.org/jira/browse/HADOOP-15679
> Project: Hadoop Common
> Issue Type: Bug
> Components: util
> Affects Versions: 2.8.0, 3.0.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Attachments: HADOOP-15679-001.patch, HADOOP-15679-002.patch,
> HADOOP-15679-002.patch, HADOOP-15679-003.patch
>
>
> HADOOP-12950 added a timeout on shutdowns to avoid problems with hanging
> shutdowns. But the timeout is too short for applications where a large flush
> of data is needed on shutdown.
> A key example of this is Spark apps which save their history to object
> stores, where the file close() call triggers an upload of the final local
> cached block of data (could be 32+MB), and then execute the final mutipart
> commit.
> Proposed
> # make the default sleep time 30s, not 10s
> # make it configurable with a time duration property (with minimum time of
> 1s.?)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]