[ 
https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263561#comment-16263561
 ] 

Konstantin Shvachko commented on HDFS-12638:
--------------------------------------------

I looked more closely through the changes introduced in HDFS-9754. It looks 
like a major refactoring with a goal which seems like a minor optimization, and 
with no test coverage. The main problem is that it violated the key invariant, 
which is mentioned in the [first 
comment|https://issues.apache.org/jira/browse/HDFS-9754?focusedCommentId=15131492&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15131492]
 of that jira, stating
* *A block is always associated with a file when added to {{BlocksMap}}*

I propose to revert HDFS-9754. I think the invariant is important and should be 
preserved.
Reverting is not trivial, since more refactoring accumulated on top of it, but 
possible.

The patch I posted here is still needed since we should remove the extra block 
in truncate case. It is not as critical though, since this is not the cause of 
NPE and the truncate block is eventually deleted, when DN finishes truncate or 
on the next block report. So to resolve this blocker for upcoming releases 
(2.8, 2.9, 3.0) we need to complete the revert.

> NameNode exits due to ReplicationMonitor thread received Runtime exception in 
> ReplicationWork#chooseTargets
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12638
>                 URL: https://issues.apache.org/jira/browse/HDFS-12638
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.8.2
>            Reporter: Jiandan Yang 
>            Priority: Blocker
>         Attachments: HDFS-12638-branch-2.8.2.001.patch, HDFS-12638.002.patch, 
> OphanBlocksAfterTruncateDelete.jpg
>
>
> Active NamNode exit due to NPE, I can confirm that the BlockCollection passed 
> in when creating ReplicationWork is null, but I do not know why 
> BlockCollection is null, By view history I found 
> [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging  
> whether  BlockCollection is null.
> NN logs are as following:
> {code:java}
> 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> ReplicationMonitor thread received Runtime exception.
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744)
>         at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to