Thanks bobby, I m looking for something like this..... Now the question is what is the best strategy to do Hot/Hot or Hot/Warm. I need to consider the CPU and Network bandwidth, also needs to decide from which layer this replication should start.
Regards, Abhishek On Mon, Apr 16, 2012 at 7:08 AM, Robert Evans <ev...@yahoo-inc.com> wrote: > Hi Abhishek, > > Manu is correct about High Availability within a single colo. I realize > that in some cases you have to have fail over between colos. I am not > aware of any turn key solution for things like that, but generally what you > want to do is to run two clusters, one in each colo, either hot/hot or > hot/warm, and I have seen both depending on how quickly you need to fail > over. In hot/hot the input data is replicated to both clusters and the > same software is run on both. In this case though you have to be fairly > sure that your processing is deterministic, or the results could be > slightly different (i.e. No generating if random ids). In hot/warm the > data is replicated from one colo to the other at defined checkpoints. The > data is only processed on one of the grids, but if that colo goes down the > other one can take up the processing from where ever the last checkpoint > was. > > I hope that helps. > > --Bobby > > On 4/12/12 5:07 AM, "Manu S" <manupk...@gmail.com> wrote: > > Hi Abhishek, > > 1. Use multiple directories for *dfs.name.dir* & *dfs.data.dir* etc > * Recommendation: write to *two local directories on different > physical volumes*, and to an *NFS-mounted* directory > - Data will be preserved even in the event of a total failure of the > NameNode machines > * Recommendation: *soft-mount the NFS* directory > - If the NFS mount goes offline, this will not cause the NameNode > to fail > > 2. *Rack awareness* > > https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf > > On Thu, Apr 12, 2012 at 2:18 AM, Abhishek Pratap Singh > <manu.i...@gmail.com>wrote: > > > Thanks Robert. > > Is there a best practice or design than can address the High Availability > > to certain extent? > > > > ~Abhishek > > > > On Wed, Apr 11, 2012 at 12:32 PM, Robert Evans <ev...@yahoo-inc.com> > > wrote: > > > > > No it does not. Sorry > > > > > > > > > On 4/11/12 1:44 PM, "Abhishek Pratap Singh" <manu.i...@gmail.com> > wrote: > > > > > > Hi All, > > > > > > Just wanted if hadoop supports more than one data centre. This is > > basically > > > for DR purposes and High Availability where one centre goes down other > > can > > > bring up. > > > > > > > > > Regards, > > > Abhishek > > > > > > > > > > > > -- > Thanks & Regards > ---- > *Manu S* > SI Engineer - OpenSource & HPC > Wipro Infotech > Mob: +91 8861302855 Skype: manuspkd > www.opensourcetalk.co.in > >