[ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-13174:
--------------------------------
    Status: Patch Available  (was: Open)

Hello [~jojochuang],

thank you very much for the review, I have attached a new version of the patch, 
that addresses the code related issues you have found.

Let me address the questions you proposed as well:
 - about ignoring the general timeout:
 The main problem with the MAX_ITERATION_TIME for the Mover is that it does not 
have any iterations, so if you want an overall timeout for the Mover either we 
can introduce iterations for the Mover as well, or just ignore this timeout. 
The patch introduces the latter, as I see it is sufficient as there are other 
timeouts the problem appears when the move is between nodes, for that we have a 
connection timeout set to the HdfsConstants.READ_TIMEOUT (60 sec), and 5 times 
the HdfsConstants.READ_TIMEOUT value set as general socket timeout, so after 5 
minutes we abandon a move to a DataNode that does not respond. (See 
Dispatcher.java line 365-367 and 373 after applying my patch.) DataNodes should 
respond twice within the time set via 
HdfsClientConfigKeys.DFS_CLIENT_SOCKET_TIMEOUT_KEY, default is 60 seconds, so 
this seems to be accurate for the Mover thread to just stuck on a failing 
DataNode for a reasonable time I think. These delays are the ones failing the 
iteration in the Balancer and in case it happens Balancer cleans up all the 
work already scheduled for the given iteration and start a new one, as the 
Mover does not have iterations, the MAX_ITERATION_TIME check I have removed 
failed the Mover in the same scenario.

 - about the flakyness of the test:
 I just added a note in the current patch for the test. After starting up the 
cluster, in between the two time checks the following happens: blocks read for 
the DNs, Balancer decides to move two blocks, schedules the two block move. 
This seems to be quite a few operation. The Balancer should fail after 2 
seconds of being run, so at the 3rd heartbeat at the 3rd second mark. so that 
leaves us 500ms with the scheduling, and getting the result from the DN, on my 
environment the total runtime detected is under 3100ms, I felt safe leaving 
500ms for slower or busier environments, but if needed we can either remove 
this assertion, or increase the time to be more on the safe side. I am against 
removing the time check, as that would leave us not testing the timeout at all, 
just that the Balancer has stopped the iteration in a status where there were 
still moves in progress.

Let me know your thoughts on the approach I took, also please check the new 
patch. Thank you!

> hdfs mover -p /path times out after 20 min
> ------------------------------------------
>
>                 Key: HDFS-13174
>                 URL: https://issues.apache.org/jira/browse/HDFS-13174
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>    Affects Versions: 3.0.0-alpha2, 2.7.4, 2.8.0
>            Reporter: Istvan Fajth
>            Assignee: Istvan Fajth
>            Priority: Major
>         Attachments: HDFS-13174.001.patch, HDFS-13174.002.patch
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, and there are moves decided and enqueued 
> between two DataNode, after 20 minutes Mover will stop with the following 
> exception reported to the console (lines might differ as this exception came 
> from a CDH5.12.1 installation).
>  java.io.IOException: Block move timed out
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
>  at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Note that this issue is not coming up if all blocks can be moved inside the 
> DataNodes without having to move the block to an other DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to