[ 
https://issues.apache.org/jira/browse/HDFS-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963614#comment-15963614
 ] 

Uma Maheswara Rao G commented on HDFS-11338:
--------------------------------------------

Its is good to see failures fixed now. However I have few comments on the 
changes.
Join is a thread method. So, keeping this method directly in non thread classes 
like BlockStorageMovementAttemptedItems may not be appropriate IMO.

How about having another method called 'disable' instead of join. This method 
will interrupt internal threads and disable functionality? Like it can make 
running flags false and interrupt threads. 
 Then rename the current stop method to stopGraceFully(). This method should do 
following, if thread is running already, then interrupt and join. If it is not 
running, then just join to have graceful stop.

So, if you want to have two step stop to save time, then call disable (this is 
not graceful stop), then call other other system threads interrupts and finally 
call stopGracefully(this will make sure to stop gracefully, means it will call 
disable if its not disabled already and then join). 
1. Use stopGracefully for dynamic start/stop feature. 
2. Use 2 step stop for  NN start/stop case to optimize time. 
Thoughts?




> [SPS]: Fix timeout issue in unit tests caused by longger NN down time
> ---------------------------------------------------------------------
>
>                 Key: HDFS-11338
>                 URL: https://issues.apache.org/jira/browse/HDFS-11338
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Wei Zhou
>            Assignee: Wei Zhou
>         Attachments: HDFS-11338-HDFS-10285.00.patch, 
> HDFS-11338-HDFS-10285.01.patch, HDFS-11338-HDFS-10285-02.patch, 
> HDFS-11338-HDFS-10285-03.patch
>
>
> As discussed in HDFS-11186, it takes longer to stop NN:
> {code}
> try {
>   storagePolicySatisfierThread.join(3000);
> } catch (InterruptedException ie) {
> }
> {code}
> So, it takes longer time to finish some tests and this leads to the timeout 
> failures.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to