Re: Thinking about retriving DFS metadata from datanodes!!!

Steve Loughran Fri, 12 Sep 2008 14:26:58 -0700

叶双明 wrote:

Thanks for paying attention  to my tentative idea!


What I thought isn't how to store the meradata, but the final (or last) way
to recover valuable data in the cluster when something worst (which destroy
the metadata in all multiple NameNode) happen. i.e. terrorist attack  or
natural disasters destroy half of cluster nodes within all NameNode, we can
recover as much data as possible by this mechanism, and hava big chance to
recover entire data of cluster because fo original replication.

You want to survive any event that loses a datacentre, you need tomirror the data off site, chosing that second site with an up to datefault line map of the city, geological knowledge of where recenteruptions ended up etc. Which is why nobody builds datacentres inEnumclaw WA that I'm aware of, the spec for the fabs in/near portland isthey ought to withstand 1-2m of volcanic ash landing on them (whatthey'd have got if there'd been an easterly wind when Mount Saint Helenswent). Then once you have some safe location for the second site, talkto your telco about how the high-bandwidth backbones in your city flow(Metropolitan Area Ethernet and the like), and try and find somewherethat meets your requirements.

Then: come up with a protocol that efficiently keeps the two sites up todate. And reliably: S3 went down last month because they'd been using aGossip-style update protocol but wheren't checksumming everything,because there's no need on a LAN, but of course on a cross-city networkmore things can go wrong, and for them it did.

Something to keep multiple hadoop filesystems synchronised efficientlyand reliably across sites could be very useful to many people.


-steve

Re: Thinking about retriving DFS metadata from datanodes!!!

Reply via email to