[
https://issues.apache.org/jira/browse/HADOOP-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Shvachko updated HADOOP-3002:
----------------------------------------
Attachment: DelBlocksInSafeMode.patch
This is the patch that postpones removal of blocks until the safe mode is off.
The main reason for delition was that block report processing was removing
blocks that do not belong
to any file directly ignoring the regular mechanism that first adds invalid
blocks into recentInvalidateSets
and then schedules them for deletion via heartbeats.
# I changed block report processing to just placing invalid blocks to
recentInvalidateSets
and not returning any commands to data-nodes. This optimized processReport()
because now it
does not scan the block report once again looking for invalid blocks.
# I changed heartbeat processing because it never checked the safe mode and
would schedule
replications or deletions if there were any in the pending lists.
During startup the pending lists are empty but in manual safe mode it may not
be the case.
So now the only commands that are allowed when safe mode is on are requests for
block reports
and distributed upgrade commands.
It is not clear why some code in handleHeartbeat() is inside the synchronized
section and some is not.
Placed everything inside.
> HDFS should not remove blocks while in safemode.
> ------------------------------------------------
>
> Key: HADOOP-3002
> URL: https://issues.apache.org/jira/browse/HADOOP-3002
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Priority: Blocker
> Fix For: 0.17.0, 0.18.0
>
> Attachments: DelBlocksInSafeMode.patch
>
>
> I noticed that data-nodes are removing blocks during a rather prolonged
> distributed upgrade when the name-node is in safe mode.
> This happened on my experimental cluster with accelerated block report rate.
> By definition in safe mode the name-node should not
> - accept client requests to change the namespace state, and
> - schedule block replications and/or block removal for the data-nodes.
> We don't want any unnecessary replications until all blocks are reported
> during startup.
> We also don't want to remove blocks if safe mode is entered manually.
> In heartbeat processing we explicitly verify that the name-node is in
> safe-mode and do not return any block commands to the data-nodes.
> Block reports can also return block commands, which should be banned during
> safe mode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.