[HA] hbase cluster should be able to ride over hdfs 'safe mode' flip and 
namenode restart/move
----------------------------------------------------------------------------------------------

                 Key: HBASE-2108
                 URL: https://issues.apache.org/jira/browse/HBASE-2108
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack
             Fix For: 0.21.0


Todd Lipcon wrote up the following speculation on what happens when NN is 
restarted/goes away/replaced by backup under hbase (see Dhruba's note here, 
http://hadoopblog.blogspot.com/2009/11/hdfs-high-availability.html, that Eli 
pointed us at for some background on the 0.21 BackupNode feature):

"For regions that are already open, HBase can continue to serve reads so long 
as the regionservers are up and do not change state. This is because the HDFS 
client APIs cache the DFS block locations (a map of block ID -> datanode 
addresses) for open files.

"If any HBase action occurs that causes the regionservers to reopen a region 
(eg a region server fails, load balancing rebalances the region assignment, or 
a compaction finishes) then the reopen will fail as the new file will not be 
able to access the NameNode to receive the block locations. As these are all 
periodic operations for HBase, it's impossible to put a specific bound on this 
time, but my guess is that at least one region server is likely to crash within 
less than a minute of a NameNode unavailability.

"Similar properties hold for writes. HBase's writing behavior is limited to 
Commit Logs which are kept open by the region servers. Writes to commit logs 
that are already open will continue to succeed, since they only involve the 
datanodes, but if a region server rolls an edit log, the open() for the new log 
will fail if the NN is unavailable. There is currently some work going on in 
HBase trunk to preallocate open files for commit logs to avoid this issue, but 
it is not complete, and it is not a full solution for the issue. The other 
issue is that the close() call that completes the write of a commit log also 
depends on a functioning NameNode - if it is unavailable, the log will be left 
in an indeterminate state and the edits may become lost when the NN recovers.

"The rolling of commit logs is triggered either when a timer elapses or when a 
certain amount of data has been written. Thus, this failure mode will trigger 
quickly when data is constantly being written to the cluster. If little data is 
being written, it still may trigger due to the automatic periodic log rolling.

"Given these above failure modes, I don't believe there is an effective HA 
solution for HBase at this point. Although HBase may continue to operate for a 
short time period while a NN recovers, it is also possible that it will fail 
nearly immediately, depending on when HBase's periodic operations happen to 
occur. Even with an automatic failover like DRBD+Heartbeat on the NN, the 
downtime may last 5-10 minutes as the new NN must both replay the edit log and 
receive block reports from every datanode before it can exit safe mode. I 
believe this will cause most NN failovers to be accompanied by a partial or 
complete failure of the HBase cluster."

The above makes sense to me.  Lets fix.  Generally our mode up to this has been 
that if hdfs goes away, we've dealt with it on a regionserver by regionserver 
basis shutting itself down to protect against dataloss.    We need to handle 
riding over NN restart/change of server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to