[ https://issues.apache.org/jira/browse/HADOOP-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507049 ]
Raghu Angadi commented on HADOOP-1134: -------------------------------------- Brief out line of when an upgrade is considered complete and how a manual override works : * Upgrade process starts when normal safemode conditions are met. * Any datanode that is already registered or heartbeats after the upgrade starts will be asked to perform datanode upgrade. * Without manual override, Namenode waits for *all* the nodes that were known to complete their upgrade. * Every few minutes it prints a brief message in namenode log about the current status. * The brief report lists the datanode that have not finished if there are only a handful left (may be <= 10?). * If some datanodes go down after the upgrade starts, automatic upgrade might not finish at all Manual override: * When an admin notices an upgrade that seems to be stuck, a 'detailed report' can be requests. * A detailed report iterates through all the blocks and checks how many blocks belong to following categories : *# Atleast minReplicas placed on upgraded datanodes. *# All the replicas placed on upgraded nodes. *# None of the replicas placed on upgraded nodes. * Based on above data, an admin can decide either manually stop the upgrade (if there are no blocks with zero upgraded replicas) or alternately bring up a dead datanode. * Detailed report and manual override can be done through new dfsadmin commands that will be added. > Block level CRCs in HDFS > ------------------------ > > Key: HADOOP-1134 > URL: https://issues.apache.org/jira/browse/HADOOP-1134 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: Raghu Angadi > Assignee: Raghu Angadi > Attachments: bc-no-upgrade-05302007.patch, > DfsBlockCrcDesign-05305007.htm, readBuffer.java, readBuffer.java > > > Currently CRCs are handled at FileSystem level and are transparent to core > HDFS. See recent improvement HADOOP-928 ( that can add checksums to a given > filesystem ) regd more about it. Though this served us well there a few > disadvantages : > 1) This doubles namespace in HDFS ( or other filesystem implementations ). In > many cases, it nearly doubles the number of blocks. Taking namenode out of > CRCs would nearly double namespace performance both in terms of CPU and > memory. > 2) Since CRCs are transparent to HDFS, it can not actively detect corrupted > blocks. With block level CRCs, Datanode can periodically verify the checksums > and report corruptions to namnode such that name replicas can be created. > We propose to have CRCs maintained for all HDFS data in much the same way as > in GFS. I will update the jira with detailed requirements and design. This > will include same guarantees provided by current implementation and will > include a upgrade of current data. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.