Hi Thanh,

No, I doubt that anybody is running BackupNode in production, since it's
only part of 0.21, and in my opinion an incomplete implementation. A few of
the deficiencies I'm aware of:

- Like you said, edits are transferred by synchronous RPC from the NN. As
far as I know, there are no timeouts enabled on these RPCs, so if the
backupnode hangs, so will the primary. In the case of a BN crash, the
primary will hang for many minutes before noticing.
- The BN doesn't provide hot standby since it doesn't yet receive block
reports.

The fact that the RPCs are synchronous seems unavoidable if you want to be
able to do a failover without any lost edits. But without timeouts, it's a
bit scary.

Some work will be going on in trunk to address high availability over the
next several months - we'd definitely appreciate your expertise in failure
injection, etc, being applied to the new code as it goes in!

-Todd

On Thu, May 5, 2011 at 9:28 AM, Thanh Do <than...@cs.wisc.edu> wrote:

> hi all,
>
> any body deploy the Backup Node in your system.
> I am curious about the impact of the Backup Node
> to the NameNode throughput.
>
> To my understanding, NameNode streams edits
> log operation to the BackupNode (by an RPC call),
> and only return once that operation has been applied
> to the in memory state of the Backup Node.
>
> Will this RPC call slow down the NameNode a little bit.
>
> Thanks
> Thanh
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to