[ 
https://issues.apache.org/jira/browse/HDFS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056402#comment-13056402
 ] 

Konstantin Shvachko commented on HDFS-2064:
-------------------------------------------

Rob, thanks for the design review. Some clarifications and comments.
Yes, I consider this design as a specification of Sanjay's and Suresh's design. 
In the sense that it is a minimalistic (in terms of changes to the existing 
code) approach dedicated to one direction - building HA based on StandbyNode.

> 1. K7.8
Accelerating block reports after failover is indeed an optimization. Good 
point, during normal operations both BlockMaps should be in sync. And 
acceleration is targeted the case when SBN misses a lot of block reports, which 
could be monitored on the SBN webUI or via metrics.

> 3. What is the scope of VIP solutions?
Talking to different people I came into conclusion that failover within one 
rack is sufficient. 
- First of all, VIP is a good abstraction and if current implementation does 
not satisfy certain needs, the networking industry will find a way to innovate. 
- Second, the rack that runs NN and SBN can be designed more reliably than 
regular (DataNode) racks. With 2 TOR switches. With  bonded interfaces (forgive 
me if I get the terminology wrong) inside the rack and outside for fault 
tolerance.
- Third, there are disasters that require a 9.0 magnitude earthquake followed 
by a tsunami to happen. Should Hadoop be designed for that? Probably not. I 
just need to hit 99.94 availability mark.

> 4. the stale deletion request problem
I hoped I covered it in 7.9. But I see now that this section needs more details 
and I missed the third important case, when setReplication() is explicitly 
decreasing the replication. I hope we can solve it by adding replica locations 
to logSetReplication(). I'll update this section.

> 6. "leader election". Is the world really symmetric?
NN and SBN are asymmetric. And this simplifies things a lot: I am active NN if 
I have the nn.vip. Some other node can think it is active, but since it doesn't 
have the vip, her ambitions don't matter as nobody can to talk to her. 
Asymmetric design eliminates leader election and client fencing. It's a good 
thing.

> 8. spooling edits on secondary storage.
By secondary storage you mean a filer or a Bookkeeper I think. Filer is an 
enterprise storage. We are building a distributed storage system based on 
commodity components, and adding a dependency on enterprise storage seams 
counterintuitive to me. Any shared storage solution will require solving 
synchronization problems, see my note about addBlock() in 7.5. If 
blockReceived() arrives to SBN before addBlock() this replica is lost for 
another hour. addBlock()  must be synchronous in order to avoid such race 
condition.

> in a practical BN deployment, is there a remaining need for some shared 
> storage?
I don't see any.

> Warm HA NameNode going Hot
> --------------------------
>
>                 Key: HDFS-2064
>                 URL: https://issues.apache.org/jira/browse/HDFS-2064
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: WarmHA-GoingHot.pdf
>
>
> This is the design for automatic hot HA for HDFS NameNode. It involves use of 
> HA software and LoadReplicator - external to Hadoop components, which 
> substantially simplify the architecture by separating HA- from 
> Hadoop-specific problems. Without the external components it provides warm 
> standby with manual failover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to