Re: Multiple data centre in Hadoop

Robert Evans Thu, 19 Apr 2012 14:34:35 -0700

If you want to start an open source project for this I am sure that there are 
others with the same problem that might be very wiling to help out. :)


--Bobby Evans

On 4/19/12 4:31 PM, "Michael Segel" <michael_se...@hotmail.com> wrote:

I don't know of any open source solution in doing this...
And yeah its something one can't talk about....  ;-)


On Apr 19, 2012, at 4:28 PM, Robert Evans wrote:

> Where I work  we have done some things like this, but none of them are open 
> source, and I have not really been directly involved with the details of it.  
> I can guess about what it would take, but that is all it would be at this 
> point.
>
> --Bobby
>
>
> On 4/17/12 5:46 PM, "Abhishek Pratap Singh" <manu.i...@gmail.com> wrote:
>
> Thanks bobby, I m looking for something like this..... Now the question is
> what is the best strategy to do Hot/Hot or Hot/Warm.
> I need to consider the CPU and Network bandwidth, also needs to decide from
> which layer this replication should start.
>
> Regards,
> Abhishek
>
> On Mon, Apr 16, 2012 at 7:08 AM, Robert Evans <ev...@yahoo-inc.com> wrote:
>
>> Hi Abhishek,
>>
>> Manu is correct about High Availability within a single colo.  I realize
>> that in some cases you have to have fail over between colos.  I am not
>> aware of any turn key solution for things like that, but generally what you
>> want to do is to run two clusters, one in each colo, either hot/hot or
>> hot/warm, and I have seen both depending on how quickly you need to fail
>> over.  In hot/hot the input data is replicated to both clusters and the
>> same software is run on both.  In this case though you have to be fairly
>> sure that your processing is deterministic, or the results could be
>> slightly different (i.e. No generating if random ids).  In hot/warm the
>> data is replicated from one colo to the other at defined checkpoints.  The
>> data is only processed on one of the grids, but if that colo goes down the
>> other one can take up the processing from where ever the last checkpoint
>> was.
>>
>> I hope that helps.
>>
>> --Bobby
>>
>> On 4/12/12 5:07 AM, "Manu S" <manupk...@gmail.com> wrote:
>>
>> Hi Abhishek,
>>
>> 1. Use multiple directories for *dfs.name.dir* & *dfs.data.dir* etc
>> * Recommendation: write to *two local directories on different
>> physical volumes*, and to an *NFS-mounted* directory
>> - Data will be preserved even in the event of a total failure of the
>> NameNode machines
>> * Recommendation: *soft-mount the NFS* directory
>> - If the NFS mount goes offline, this will not cause the NameNode
>> to fail
>>
>> 2. *Rack awareness*
>>
>> https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf
>>
>> On Thu, Apr 12, 2012 at 2:18 AM, Abhishek Pratap Singh
>> <manu.i...@gmail.com>wrote:
>>
>>> Thanks Robert.
>>> Is there a best practice or design than can address the High Availability
>>> to certain extent?
>>>
>>> ~Abhishek
>>>
>>> On Wed, Apr 11, 2012 at 12:32 PM, Robert Evans <ev...@yahoo-inc.com>
>>> wrote:
>>>
>>>> No it does not. Sorry
>>>>
>>>>
>>>> On 4/11/12 1:44 PM, "Abhishek Pratap Singh" <manu.i...@gmail.com>
>> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> Just wanted if hadoop supports more than one data centre. This is
>>> basically
>>>> for DR purposes and High Availability where one centre goes down other
>>> can
>>>> bring up.
>>>>
>>>>
>>>> Regards,
>>>> Abhishek
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Thanks & Regards
>> ----
>> *Manu S*
>> SI Engineer - OpenSource & HPC
>> Wipro Infotech
>> Mob: +91 8861302855                Skype: manuspkd
>> www.opensourcetalk.co.in
>>
>>
>

Re: Multiple data centre in Hadoop

Reply via email to