[ 
https://issues.apache.org/jira/browse/HDFS-14687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896944#comment-16896944
 ] 

Surendra Singh Lilhore commented on HDFS-14687:
-----------------------------------------------

Root Cause

==========

 SBN keep the future block IBR message in 
{{PendingDataNodeMessages#queueByBlockId}} map.
{code:java}
Map<Block, Queue<ReportedBlockInfo>> queueByBlockId =  Maps.newHashMap();
{code}
This map keep the block as key and replicas in value queue.

*ex :*
 blk_1073779215_38391 ==> \{blk_1073779215_38391 from DN1, blk_1073779215_38391 
from DN2, blk_1073779215_38391 from DN3}

In the case of EC it should be block group as key and sub EC blocks should be 
in value queue.

*ex :*
 blk_-9223372036812232768_2862155 ==> \{blk_-9223372036812232768 from DN1, 
blk_-9223372036812232767 from DN2, blk_-9223372036812232766  from DN3, 
blk_-9223372036812232765 from DN4, blk_-9223372036812232764 from DN5}

But currently it is keeping all the sub block as single entry in Map which is 
wrong.

*ex :*
 blk_-9223372036812232768_2862155 ==> \{blk_-9223372036812232768 from DN1}
 blk_-9223372036812232767_2862155 ==> \{blk_-9223372036812232767 from DN2}
 blk_-9223372036812232766_2862155 ==> \{blk_-9223372036812232766 from DN3}
 blk_-9223372036812232765_2862155 ==> \{blk_-9223372036812232765 from DN4}
 blk_-9223372036812232764_2862155 ==> \{blk_-9223372036812232764 from DN5}

When trail edit happen for OP_ADD_BLOCK, it will take first entry with block 
group id and process. Other entry will not be processed any time and EC block 
not be marked as safe block because to mark it safe need all sub blocks.

It is kind of memory leak.

> Standby Namenode never come out of samemode when EC files are being written.
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-14687
>                 URL: https://issues.apache.org/jira/browse/HDFS-14687
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec, namenode
>    Affects Versions: 3.1.1
>            Reporter: Surendra Singh Lilhore
>            Assignee: Surendra Singh Lilhore
>            Priority: Critical
>
> When huge number of EC files are being written and SBN is restarted then it 
> will never come out of same mode and required blocks count getting increase.
> {noformat}
> The reported blocks 16658401 needs additional 1702 blocks to reach the 
> threshold 0.99999 of total blocks 16660120.
> The reported blocks 16658659 needs additional 2935 blocks to reach the 
> threshold 0.99999 of total blocks 16661611.
> The reported blocks 16659947 needs additional 3868 blocks to reach the 
> threshold 0.99999 of total blocks 16663832.
> The reported blocks 16666335 needs additional 5116 blocks to reach the 
> threshold 0.99999 of total blocks 16671468.
> The reported blocks 16669311 needs additional 6384 blocks to reach the 
> threshold 0.99999 of total blocks 16675712.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to