[
https://issues.apache.org/jira/browse/HADOOP-15679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583352#comment-16583352
]
Xiaoyu Yao commented on HADOOP-15679:
-------------------------------------
Thanks [[email protected]] for working on this. The patch v2 LGTM. I just have
one question and few minor comments.
It's hard to come up with a magic timeout value that applies to the various
usages of shutdown hooks in Hadoop. With a single configuration, different file
systems may still conflict on their timeout settings. Have you considering per
file system (like s3, wasb, etc.) shutdown timeout (passed in when calling
ShutdownHookManager#addShutdownHook in FileSystem#getInternal() ) as needed
while keep others with a small default value?
ShutdownHookManager.java
Line 90: do we intend to use LOG.debug to display the stack info here? Can we
add a conditional guard?
Line 94: can we add a LOG.info (or warn) on the number of timeout hooks
returned from executeShutdown()? Maybe we can also add a debug log profiling
each shutdown hook execution time, which will help deciding the timeout value
configuration for future?
Line 144: NIT: unfinished comments
Line 183: can we add a LOG.info (or warn) on invalid timeout value?
Line 297: NIT: blank change
> ShutdownHookManager shutdown time needs to be configurable & extended
> ---------------------------------------------------------------------
>
> Key: HADOOP-15679
> URL: https://issues.apache.org/jira/browse/HADOOP-15679
> Project: Hadoop Common
> Issue Type: Bug
> Components: util
> Affects Versions: 2.8.0, 3.0.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Attachments: HADOOP-15679-001.patch, HADOOP-15679-002.patch,
> HADOOP-15679-002.patch
>
>
> HADOOP-12950 added a timeout on shutdowns to avoid problems with hanging
> shutdowns. But the timeout is too short for applications where a large flush
> of data is needed on shutdown.
> A key example of this is Spark apps which save their history to object
> stores, where the file close() call triggers an upload of the final local
> cached block of data (could be 32+MB), and then execute the final mutipart
> commit.
> Proposed
> # make the default sleep time 30s, not 10s
> # make it configurable with a time duration property (with minimum time of
> 1s.?)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]