[ 
https://issues.apache.org/jira/browse/HDFS-14295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778533#comment-16778533
 ] 

BELUGA BEHR commented on HDFS-14295:
------------------------------------

[~elgoiri] Thanks for the clarification.  It seems to me to be 'too much' to 
separate the two calls to {{shutdown}}.  If we move the first call higher up in 
the overall DN shutdown process, then we'll probably want to mark the time 
stamp of that interaction so that it can be calculated how much time has passed 
between the {{shutdown}} call and the {{awaitTermination}} call.  This would 
allow the software to wait: 30s - Xs where X is the amount of time that elapsed 
between the two calls.  This ensure that the shutdown requires no more than 30s 
between the two calls.  However, I'm not sure how much time would even pass 
between the two calls, 1s? 2s? 5s?  Seems like too much complexity and 
overheard in the code to save a few seconds on shutdown (which rarely occurs).

Regarding the logging, it's not so obvious.  The JavaDoc states regarding 
{{shutdownNow}}:

bq. Attempts to stop all actively executing tasks, halts the processing of 
waiting tasks, and returns a list of the tasks that were *awaiting* execution.

So, the list that is returned is not a list of running tasks that got killed.  
It is a list of all the tasks remaining in the queue.  However, note, that this 
current setup does not queue any tasks.  The maximum number of running threads 
is 4096 (by default).  If anything comes in after that, it will be rejected, 
not queued.  Therefore, I would not expect this method to ever return anything. 
 Please note that any threads that are still running will be interrupted by the 
call to {{shutdownNow}} and will most likely generate some sort of "I've been 
interrupted. Exiting." log message from within each Thread.

> Add Threadpool for DataTransfers
> --------------------------------
>
>                 Key: HDFS-14295
>                 URL: https://issues.apache.org/jira/browse/HDFS-14295
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 3.2.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>         Attachments: HDFS-14295.1.patch, HDFS-14295.2.patch, 
> HDFS-14295.3.patch, HDFS-14295.4.patch, HDFS-14295.5.patch, 
> HDFS-14295.6.patch, HDFS-14295.7.patch, HDFS-14295.8.patch
>
>
> When a DataNode data transfers a block, is spins up a new thread for each 
> transfer.  
> [Here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L2339]
>  and 
> [Here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L3019-L3022].
>    Instead, add the threads to a {{CachedThreadPool}} so that when their 
> threads complete the transfer, they can be re-used for another transfer. This 
> should save resources spent on creating and spinning up transfer threads.
> One thing I'll point out that's a bit off, which I address in this patch, ...
> There are two places in the code where a {{DataTransfer}} thread is started. 
> In [one 
> place|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L2339-L2341],
>  it's started in a default thread group. In [another 
> place|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L3019-L3022],
>  it's started in the 
> [dataXceiverServer|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L1164]
>  thread group.
> I do not think it's correct to include any of these threads in the 
> {{dataXceiverServer}} thread group. Anything submitted to the 
> {{dataXceiverServer}} should probably be tied to the 
> {{dfs.datanode.max.transfer.threads}} configurations, and neither of these 
> methods are. Instead, they should be submitted into the same thread pool with 
> its own thread group (probably the default thread group, unless someone 
> suggests otherwise) and is what I have included in this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to