[jira] [Commented] (HDFS-16423) balancer should not get blocks on stale storages

Stephen O'Donnell (Jira) Wed, 26 Jan 2022 00:51:35 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482324#comment-17482324
 ]


Stephen O'Donnell commented on HDFS-16423:
------------------------------------------

I have a question on this Jira.

As I understand it, the namenode marks all storages stale after a failover. The 
only way the storage is marked as "not stale" is when a FBR is sent from the 
datanode. This FBR interval is 6 hours by default, but some set it higher. I 
don't think there is any mechanism to trigger the FBR early due to the failover.

If we tell the balancer to not pick blocks from stale storages, then after a 
failover the balancer will effectively not work at all for up to the FBR 
interval, as there will be no storages for it to pick blocks from. Is that 
correct?

I wonder if we should log a message in the NN indicating "all storages as 
stale" to help people understand that is why the balancer is not working? 
Probably, the balancer will try 5 times to get some blocks to move and then 
give up with a "failed to move any blocks in 5 iterations" message.

> balancer should not get blocks on stale storages
> ------------------------------------------------
>
>                 Key: HDFS-16423
>                 URL: https://issues.apache.org/jira/browse/HDFS-16423
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer &amp; mover
>            Reporter: qinyuren
>            Assignee: qinyuren
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.3
>
>         Attachments: image-2022-01-13-17-18-32-409.png
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We have met a problems as described in HDFS-16420
> We found that balancer copied a block multi times without deleting the source 
> block if this block was placed in a stale storage. And resulting a block with 
> many copies, but these redundant copies are not deleted until the storage 
> become not stale.
>  
> !image-2022-01-13-17-18-32-409.png|width=657,height=275!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16423) balancer should not get blocks on stale storages

Reply via email to