Me and Devaraj attended their talk on their solution for paxos based namenode and HBase replication.
They have two solutions, one for single datacenter, and the other multi DC geo replication. For the namenode, there is a wrapper, called ConsistencyNode, that basically gets the requests, replicate it via their consistency protocol to other CNodes within the DC (paxos based) in the edit log. If the proposal for this is accepted, the changes are made durable. However, from my understanding, on the read side the client chooses only one replica to read. The client decides to connect to one of the replica namenodes, which means that it is not doing a paxos read. I think they also wrapped the client, so that if it gets a FileNotFoundException or something similar, it will retry on a different server. Also they track the last seen proposal id as a transaction id for this as well from my understanding (so read-what-you-write consistency maybe?). The full details of the consistency was not clear to me from the presentation. For their multi-DC replication, they are doing a similar thing, but the data replication is not handled by paxos, only the namenode metadata. For each datacenter, they have a target replication factor (can be set differently for each DC, like 0 because of regulatory reasons). The metadata of NN is replicated via a similar mechanism. The data replication is async to the metadata replication though. When a block is finalized, the CNode quorum on that particular DC, schedules a remote copy to one of the datacenters. That copy job, copies the block with directly writing the block from the datanode to a remote datanode. Then that remote DC block is replicated to the target replication by that DC's CNode quorum. When the target is reached, that DC will create another proposal about the data replication being complete. So the state machine probably contains where each data is replicated, but they were still mentioning the client getting DataNotReplicatedException or something. Their work on HBase is still WIP. I do not remember much details on the protocol, except it uses the same replication protocol (their "patented" paxos based replication). Of course the devil is in the details. I did not get that from the presentation. As a side note, Doug when asked, was saying that they are cooking something for backups, so maybe their "secret project" also contains multi-DC consistent state? Enis On Sat, Apr 5, 2014 at 1:55 AM, Ted Yu <[email protected]> wrote: > Enis: > There was a talk by Konstantin > Boudnik<http://hadoopsummit.org/amsterdam/speakers/#konstantin-boudnik> > . > > Any interesting material from his presentation ? > > Cheers >
