[ 
https://issues.apache.org/jira/browse/HADOOP-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507049
 ] 

Raghu Angadi commented on HADOOP-1134:
--------------------------------------

Brief out line of when an upgrade is considered complete and how a manual 
override works :

  * Upgrade process starts when normal safemode conditions are met.
  * Any datanode that is already registered or heartbeats after the upgrade 
starts will be asked to perform datanode upgrade.
  * Without manual override, Namenode waits for *all* the nodes that were known 
to complete their upgrade.
  * Every few minutes it prints a brief message in namenode log about the 
current status. 
  * The brief report lists the datanode that have not finished if there are 
only a handful left (may be <= 10?).
  * If some datanodes go down after the upgrade starts, automatic upgrade might 
not finish at all

Manual override:

   * When an admin notices an upgrade that seems to be stuck, a 'detailed 
report' can be requests.
   * A detailed report iterates through all the blocks and checks how many 
blocks belong to following categories :
   *# Atleast minReplicas placed on upgraded datanodes.
   *#  All the replicas placed on upgraded nodes.
   *# None of the replicas placed on upgraded nodes.
   * Based on  above data, an admin can decide either manually stop the upgrade 
(if there are no blocks with zero upgraded replicas) or alternately bring up a 
dead datanode.
   * Detailed report and manual override can be done through new dfsadmin 
commands that will  be added.
 


> Block level CRCs in HDFS
> ------------------------
>
>                 Key: HADOOP-1134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1134
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: bc-no-upgrade-05302007.patch, 
> DfsBlockCrcDesign-05305007.htm, readBuffer.java, readBuffer.java
>
>
> Currently CRCs are handled at FileSystem level and are transparent to core 
> HDFS. See recent improvement HADOOP-928 ( that can add checksums to a given 
> filesystem ) regd more about it. Though this served us well there a few 
> disadvantages :
> 1) This doubles namespace in HDFS ( or other filesystem implementations ). In 
> many cases, it nearly doubles the number of blocks. Taking namenode out of 
> CRCs would nearly double namespace performance both in terms of CPU and 
> memory.
> 2) Since CRCs are transparent to HDFS, it can not actively detect corrupted 
> blocks. With block level CRCs, Datanode can periodically verify the checksums 
> and report corruptions to namnode such that name replicas can be created.
> We propose to have CRCs maintained for all HDFS data in much the same way as 
> in GFS. I will update the jira with detailed requirements and design. This 
> will include same guarantees provided by current implementation and will 
> include a upgrade of current data.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to